Commit Graph

2132 Commits

Author SHA1 Message Date
Robert Haas dc02c7bca4 Fix wrong costing of Sort under Gather Merge.
There's no mechanism for such a sort to become a top-N sort, so we
should pass -1 rather than limit_tuples to cost_sort().

Rushabh Lathia, per a report from Mithun Cy

Discussion: http://postgr.es/m/CAGPqQf1akRcSgC9=6iwx=sEPap9UvPpHJLzg8_N+OuHdb6fL+g@mail.gmail.com
2017-03-22 14:45:14 -04:00
Robert Haas d3cc37f1d8 Don't scan partitioned tables.
Partitioned tables do not contain any data; only their unpartitioned
descendents need to be scanned.  However, the partitioned tables still
need to be locked, even though they're not scanned.  To make that
work, Append and MergeAppend relations now need to carry a list of
(unscanned) partitioned relations that must be locked, and InitPlan
must lock all partitioned result relations.

Aside from the obvious advantage of avoiding some work at execution
time, this has two other advantages.  First, it may improve the
planner's decision-making in some cases since the empty relation
might throw things off.  Second, it paves the way to getting rid of
the storage for partitioned tables altogether.

Amit Langote, reviewed by me.

Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
2017-03-21 09:48:04 -04:00
Robert Haas 1ea60ad602 Fix failure to use clamp_row_est() for parallel joins.
Commit 0c2070cefa neglected to use
clamp_row_est() where it should have done so.

Patch by me.  Report by Amit Kapila.

Discussion: http://postgr.es/m/CAA4eK1KPm8RYa1Kun3ZmQj9pb723b-EFN70j47Pid1vn3ByquA@mail.gmail.com
2017-03-15 12:28:54 -04:00
Robert Haas c44c47a773 Some preliminary refactoring towards partitionwise join.
Partitionwise join proposes add a concept of child join relations,
which will have the same relationship with join relations as "other
member" relations do with base relations.  These relations will need
some but not all of the handling that we currently have for join
relations, and some but not all of the handling that we currently have
for appendrels, since they are a mix of the two.  Refactor a little
bit so that the necessary bits of logic are exposed as separate
functions.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi and
by me.

Discussion: http://postgr.es/m/CAFjFpRfqotRR6cM3sooBHMHEVdkFfAZ6PyYg4GRZsoMuW08HjQ@mail.gmail.com
2017-03-14 19:25:47 -04:00
Robert Haas 2609e91fcf Fix regression in parallel planning against inheritance tables.
Commit 51ee6f3160 accidentally changed
the behavior around inheritance hierarchies; before, we always
considered parallel paths even for very small inheritance children,
because otherwise an inheritance hierarchy with even one small child
wouldn't be eligible for parallelism.  That exception was inadverently
removed; put it back.

In passing, also adjust the degree-of-parallelism comptuation for
index-only scans not to consider the number of heap pages fetched.
Otherwise, we'll avoid parallel index-only scans on tables that are
mostly all-visible, which isn't especially logical.

Robert Haas and Amit Kapila, per a report from Ashutosh Sharma.

Discussion: http://postgr.es/m/CAE9k0PmgSoOHRd60SHu09aRVTHRSs8s6pmyhJKWHxWw9C_x+XA@mail.gmail.com
2017-03-14 14:33:14 -04:00
Peter Eisentraut a47b38c9ee Spelling fixes
From: Josh Soref <jsoref@gmail.com>
2017-03-14 12:58:39 -04:00
Peter Eisentraut f97a028d8e Spelling fixes in code comments
From: Josh Soref <jsoref@gmail.com>
2017-03-14 12:58:39 -04:00
Robert Haas a82178020d Update overlooked comment for Gather Merge.
Commit 355d3993c5 probably should have
done this, but nobody noticed that it was needed.
2017-03-14 07:52:11 -04:00
Robert Haas bce352fb46 Remove some bogus logic from create_gather_merge_plan.
This logic was adapated from create_merge_append_plan, but the two
cases aren't really analogous, because create_merge_append_plan is not
projection-capable and must therefore have a tlist identical to that
of the underlying paths.  Overwriting the tlist of Gather Merge with
whatever the underlying plan happens to produce is no good at all.

Patch by me, reviewed by Rushabh Lathia, who also reported the issue
and made an initial attempt at a fix.

Discussion: http://postgr.es/m/CA+Tgmob_-oHEOBfT9S25bjqokdqv8e8xEmh9zOY+3MPr_LmuhA@mail.gmail.com
2017-03-14 07:43:45 -04:00
Alvaro Herrera a9c074ba7e Silence unused variable compiler warning
Fallout from fcec6caafa2: mark a variable in
set_tablefunc_size_estimates as used for asserts only.

Also, the planner_rte_fetch() call is pointless with assertions
disabled, so enclose it in a USE_ASSERT_CHECKING #ifdef; fix the same
problem in set_subquery_size_estimates().

First problem noted by David Rowley, whose compiler is noisier than mine
in this regard.
2017-03-13 19:02:38 -03:00
Robert Haas 0ee92e1c9b Fix a couple of planner bugs in Gather Merge.
Neha Sharma reported these to Rushabh Lathia just after I commit
355d3993c5 went in.  The patch is
Rushabh's, with input from me.
2017-03-09 12:06:49 -05:00
Robert Haas 355d3993c5 Add a Gather Merge executor node.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers.  (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)

This works out to a win if it saves us from having to perform an
expensive Sort.  In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't.  But if this avoids an expensive sort then it's a win.

Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.

Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
2017-03-09 07:49:29 -05:00
Robert Haas f35742ccb7 Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.

Dilip Kumar, with some corrections and cosmetic changes by me.  The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.

Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 12:05:43 -05:00
Alvaro Herrera fcec6caafa Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-03-08 12:40:26 -03:00
Robert Haas 506f05423a Properly initialize variable.
Commit 3bc7dafa9b forgot to do this.

Noted while experimenting with valgrind.
2017-03-07 13:50:52 -05:00
Robert Haas 3bc7dafa9b Consider parallel merge joins.
Commit 45be99f8cd took the position
that performing a merge join in parallel was not likely to work out
well, but this conclusion was greeted with skepticism even at the
time.  Whether it was true then or not, it's clearly not true any
more now that we have parallel index scan.

Dilip Kumar, reviewed by Amit Kapila and by me.

Discussion: http://postgr.es/m/CAFiTN-v3=cM6nyFwFGp0fmvY4=kk79Hq9Fgu0u8CSJ-EEq1Tiw@mail.gmail.com
2017-03-07 11:54:51 -05:00
Robert Haas a71f10189d Preparatory refactoring for parallel merge join support.
Extract the logic used by hash_inner_and_outer into a separate
function, get_cheapest_parallel_safe_total_inner, so that it can
also be used to plan parallel merge joins.

Also, add a require_parallel_safe argument to the existing function
get_cheapest_path_for_pathkeys, because parallel merge join needs
to find the cheapest path for a given set of pathkeys that is
parallel-safe, not just the cheapest one overall.

Patch by me, reviewed by Dilip Kumar.

Discussion: http://postgr.es/m/CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com
2017-03-07 10:33:29 -05:00
Robert Haas 655393a022 Fix parallel hash join path search.
When the very cheapest path is not parallel-safe, we want to instead use
the cheapest unparameterized path that is.  The old code searched
innerrel->cheapest_parameterized_paths, but that isn't right, because
the path we want may not be in that list.  Search innerrel->pathlist
instead.

Spotted by Dilip Kumar.

Discussion: http://postgr.es/m/CAFiTN-szCEcZrQm0i_w4xqSaRUTOUFstNu32Zn4rxxDcoa8gnA@mail.gmail.com
2017-03-07 10:22:07 -05:00
Tom Lane c56ac2913a Suppress unused-variable warning.
Rearrange so we don't have an unused variable in disable-cassert case.

Discussion: https://postgr.es/m/CAMkU=1x63f2QyFTeas83xJqD+Hm1PBuok1LrzYzS-OngDzYOVA@mail.gmail.com
2017-02-21 17:58:24 -05:00
Peter Eisentraut 38d103763d Make more use of castNode() 2017-02-21 11:59:09 -05:00
Robert Haas 0414b26bac Add optimizer and executor support for parallel index-only scans.
Commit 5262f7a4fc added similar support
for parallel index scans; this extends that work to index-only scans.
As with parallel index scans, this requires support from the index AM,
so currently parallel index-only scans will only be possible for btree
indexes.

Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja,
and Amit Kapila

Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
2017-02-19 15:57:55 +05:30
Robert Haas 5262f7a4fc Add optimizer and executor support for parallel index scans.
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes.  This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.

Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
2017-02-15 13:53:24 -05:00
Robert Haas 51ee6f3160 Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan.  Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way.  Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.

Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 13:37:24 -05:00
Robert Haas 5e6d8d2bbb Allow parallel workers to execute subplans.
This doesn't do anything to make Param nodes anything other than
parallel-restricted, so this only helps with uncorrelated subplans,
and it's not necessarily very cheap because each worker will run the
subplan separately (just as a Hash Join will build a separate copy of
the hash table in each participating process), but it's a first step
toward supporting cases that are more likely to help in practice, and
is occasionally useful on its own.

Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and
me.

Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
2017-02-14 18:16:03 -05:00
Tom Lane 8d396a0a70 Remove duplicate code in planner.c.
I noticed while hacking on join UNION transforms that planner.c's
function get_base_rel_indexes() just duplicates the functionality of
get_relids_in_jointree().  It doesn't even have the excuse of being
older code :-(.  Drop it and use the latter function instead.
2017-02-14 11:47:45 -05:00
Heikki Linnakangas 181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Tom Lane 555494d1bc Fix placement of initPlans when forcibly materializing a subplan.
If we forcibly place a Material node atop a finished subplan, we need
to move any initPlans attached to the subplan up to the Material node,
in order to keep SS_finalize_plan() happy.  I'd figured this out in
commit 7b67a0a49 for the case of materializing a cursor plan, but out of
an abundance of caution, I put the initPlan movement hack at the call
site for that case, rather than inside materialize_finished_plan().
That was the wrong thing, because it turns out to also be necessary for
the only other caller of materialize_finished_plan(), ie subselect.c.
We lacked any test cases that exposed the mistake, but bug#14524 from
Wei Congrui shows that it's possible to get an initPlan reference into
the top tlist in that case too, and then SS_finalize_plan() complains.
Hence, move the hack into materialize_finished_plan().

In HEAD, also relocate some recently-added tests in subselect.sql, which
I'd unthinkingly dropped into the middle of a sequence of related tests.

Report: https://postgr.es/m/20170202060020.1400.89021@wrigleys.postgresql.org
2017-02-02 19:11:32 -05:00
Tom Lane c82d4e658e Fix mishandling of tSRFs at different nesting levels.
Given a targetlist like "srf(x), f(srf(x))", split_pathtarget_at_srfs()
decided that it needed two levels of ProjectSet nodes, failing to notice
that the two SRF calls are textually equal().  Because of that, setrefs.c
would convert the upper ProjectSet's tlist to "Var1, f(Var1)" (where Var1
represents a reference to the srf(x) output of the lower ProjectSet).
This triggered an assertion in nodeProjectSet.c complaining that it found
no SRFs to evaluate, as reported by Erik Rijkers.

What we want in such a case is to evaluate srf(x) only once and use a plain
Result node to compute "Var1, f(Var1)"; that gives results similar to what
previous versions produced, whereas allowing srf(x) to be evaluated again
in an upper ProjectSet would square the number of rows emitted.

Furthermore, even if the SRF calls aren't textually identical, we want them
to be evaluated in lockstep, because that's what happened in the old
implementation.  But split_pathtarget_at_srfs() got this completely wrong,
using two levels of ProjectSet for a case like "srf(x), f(srf(y))".

Hence, rewrite split_pathtarget_at_srfs() from the ground up so that it
groups SRFs according to the depth of nesting of SRFs in their arguments.
This is pretty much how we envisioned that working originally, but I blew
it when it came to implementation.

In passing, optimize the case of target == input_target, which I noticed
is not only possible but quite common.

Discussion: https://postgr.es/m/dcbd2853c05d22088766553d60dc78c6@xs4all.nl
2017-02-02 16:38:18 -05:00
Robert Haas da08a65989 Refactor bitmap heap scan estimation of heap pages fetched.
Currently, we only need this logic in order to cost a Bitmap Heap
Scan.  But a pending patch for Parallel Bitmap Heap Scan also uses
it to help figure out how many workers to use for the scan, which
has to be determined prior to costing.  So, move the logic to
a separate function to make that easier.

Dilip Kumar.  The patch series of which this is a part has been
reviewed by Andres Freund, Amit Khendekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, and me; it is not clear from the email
discussion which of those people have looked specifically at this
part.

Discussion: http://postgr.es/m/CAFiTN-v3QYNJEZnnmKCeATuLbN-h9tMVfeEF0+BrouYDqjXgwg@mail.gmail.com
2017-01-27 16:28:47 -05:00
Tom Lane 3c821466ab Fix example plan in optimizer/README.
Joining three tables only takes two join nodes.  I think when I (tgl)
wrote this, I was envisioning possible additional joins; but since the
example doesn't show any fourth table, it's just confusing to write
a third join node.

Etsuro Fujita

Discussion: https://postgr.es/m/e6cfbaa3-af02-1abc-c25e-8fa5c6bc4e21@lab.ntt.co.jp
2017-01-23 09:38:36 -05:00
Tom Lane d479e37e3d Fix Assert failure induced by commit 215b43cdc.
I'd somehow talked myself into believing that set_append_rel_size
doesn't need to worry about getting back an AND clause when it applies
eval_const_expressions to the result of adjust_appendrel_attrs (that is,
transposing the appendrel parent's restriction clauses for one child).
But that is nonsense, and Andreas Seltenreich's fuzz tester soon
turned up a counterexample.  Put back the make_ands_implicit step
that was there before, and add a regression test covering the case.

Report: https://postgr.es/m/878tq6vja6.fsf@ansel.ydns.eu
2017-01-19 18:20:58 -05:00
Andres Freund ea15e18677 Remove obsoleted code relating to targetlist SRF evaluation.
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.

This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.

Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-19 14:40:41 -08:00
Andres Freund 69f4b9c85f Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.

This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.

That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.

To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist.  The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes.  The ProjectSet nodes always get
input from an underlying node.

We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.

While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e.  continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones.  We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.

Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node.  As that's quite confusing, we're
likely to instead prohibit SRFs in those places.  But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.

There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.

Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-18 13:40:27 -08:00
Robert Haas 716c7d4b24 Factor out logic for computing number of parallel workers.
Forthcoming patches to allow other types of parallel scans will
need this logic, or something like it.

Dilip Kumar
2017-01-18 13:54:45 -05:00
Tom Lane 215b43cdc8 Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows.  The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.

To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation.  Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so.  This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.

This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only.  Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.

Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.

Patch by me, reviewed by Stephen Frost and Dean Rasheed.

Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 12:58:20 -05:00
Tom Lane 0777f7a2e8 Fix matching of boolean index columns to sort ordering.
Normally, if we have a WHERE clause like "indexcol = constant",
the planner will figure out that that index column can be ignored
when determining whether the index has a desired sort ordering.
But this failed to work for boolean index columns, because a
condition like "boolcol = true" is canonicalized to just "boolcol"
which does not give rise to an EquivalenceClass.  Add a check to
allow the same type of deduction to be made in this case too.

Per a complaint from Dima Pavlov.  Arguably this is a bug, but given the
limited impact and the small number of complaints so far, I won't risk
destabilizing plans in stable branches by back-patching.

Patch by me, reviewed by Michael Paquier

Discussion: https://postgr.es/m/1788.1481605684@sss.pgh.pa.us
2017-01-15 14:09:35 -05:00
Tom Lane ab1f0c8225 Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements.  It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list.  This patch brings
similar consistency to the outputs of raw parsing and planning steps:

* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.

* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements.  In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node.  This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.

Now, every list of statements has a consistent head-node type depending
on how far along it is in processing.  This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.

Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc.  That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way.  It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)

Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list.  While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.

The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement.  This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)

There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.

Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes.  This allows
more intelligent handling of cases where a source query string contains
multiple statements.  This patch doesn't actually do anything with the
information, but a follow-on patch will.  (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)

catversion bump because addition of location fields to struct Query
affects stored rules.

This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.

Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
2017-01-14 16:02:35 -05:00
Robert Haas 0c2070cefa Fix cardinality estimates for parallel joins.
For a partial path, the cardinality estimate needs to reflect the
number of rows we think each worker will see, rather than the total
number of rows; otherwise, costing will go wrong.  The previous coding
got this completely wrong for parallel joins.

Unfortunately, this change may destabilize plans for users of 9.6 who
have enabled parallel query, but since 9.6 is still fairly new I'm
hoping expectations won't be too settled yet.  Also, this is really a
brown-paper-bag bug, so leaving it unfixed for the entire lifetime of
9.6 seems unwise.

Related reports (whose import I initially failed to recognize) by
Tomas Vondra and Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com
2017-01-13 13:34:10 -05:00
Robert Haas 18fc5192a6 Remove unnecessary arguments from partitioning functions.
RelationGetPartitionQual() and generate_partition_qual() are always
called with recurse = true, so we don't need an argument for that.

Extracted by me from a larger patch by Amit Langote.
2017-01-04 14:56:37 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas 59649c3f1c Refactor merge path generation code.
This shouldn't change the set of paths that get generated in any
way, but it is preparatory work for further changes to allow a
partial path to be merge-joined witih a non-partial path to produce
a partial join path.

Dilip Kumar, with cosmetic adjustments by me.
2016-12-21 09:45:50 -05:00
Tom Lane 7fa93eec4e Fix FK-based join selectivity estimation for semi/antijoins.
This case wasn't thought through sufficiently in commit 100340e2d.
It's true that the FK proves that every outer row has a match in the
inner table, but we forgot that some of the inner rows might be filtered
away by WHERE conditions located within the semijoin's RHS.

If the RHS is just one table, we can reasonably take the semijoin
selectivity as equal to the fraction of the referenced table's rows
that are expected to survive its restriction clauses.

If the RHS is a join, it's not clear how much of the referenced table
might get through the join, so fall back to the same rule we were
already using for other outer-join cases: use the minimum of the
regular per-clause selectivity estimates.  This gives the same result
as if we hadn't considered the FK at all when there's a single FK
column, but it should still help for multi-column FKs, which is the
case that 100340e2d is really meant to help with.

Back-patch to 9.6 where the previous commit came in.

Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us
2016-12-17 15:28:54 -05:00
Tom Lane 1f542a2eac Prevent planagg.c from failing on queries containing CTEs.
The existing tests in preprocess_minmax_aggregates() usually prevent it
from trying to do anything with queries containing CTEs, but there's an
exception: a CTE could be present as a member of an appendrel, if we
flattened a UNION ALL that contains CTE references.  If it did try to
generate an optimized path for a query using a CTE, it failed with
"could not find plan for CTE", as reported by Torsten Förtsch.

The proximate cause is an unwise decision in commit 3fc6e2d7f to clear
subroot->cte_plan_ids in build_minmax_path().  That left the subroot's
cte_plan_ids list out of step with its parse->cteList.

Removing the "subroot->cte_plan_ids = NIL;" assignment is enough to let
the case work again, but really it's pretty silly to be expending any
cycles at all in this module when there are CTEs: we always treat their
outputs as unordered so there's no way for the optimization to win.
Hence, also add an early-exit test so we don't waste time like that.

Back-patch to 9.6 where the misbehavior was introduced.

Report: https://postgr.es/m/CAKkG4_=gjY5QiHtqSZyWMwDuTd_CftKoTaCqxjJ7uUz1-Gw=qw@mail.gmail.com
2016-12-13 13:20:37 -05:00
Tom Lane 0b78106cd4 Fix reporting of column typmods for multi-row VALUES constructs.
expandRTE() and get_rte_attribute_type() reported the exprType() and
exprTypmod() values of the expressions in the first row of the VALUES as
being the column type/typmod returned by the VALUES RTE.  That's fine for
the data type, since we coerce all expressions in a column to have the same
common type.  But we don't coerce them to have a common typmod, so it was
possible for rows after the first one to return values that violate the
claimed column typmod.  This leads to the incorrect result seen in bug
#14448 from Hassan Mahmood, as well as some other corner-case misbehaviors.

The desired behavior is the same as we use in other type-unification
cases: report the common typmod if there is one, but otherwise return -1
indicating no particular constraint.  It's cheap for transformValuesClause
to determine the common typmod while transforming a multi-row VALUES, but
it'd be less cheap for expandRTE() and get_rte_attribute_type() to
re-determine that info every time they're asked --- possibly a lot less
cheap, if the VALUES has many rows.  Therefore, the best fix is to record
the common typmods explicitly in a list in the VALUES RTE, as we were
already doing for column collations.  This looks quite a bit like what
we're doing for CTE RTEs, so we can save a little bit of space and code by
unifying the representation for those two RTE types.  They both now share
coltypes/coltypmods/colcollations fields.  (At some point it might seem
desirable to populate those fields for all RTE types; but right now it
looks like constructing them for other RTE types would add more code and
cycles than it would save.)

The RTE change requires a catversion bump, so this fix is only usable
in HEAD.  If we fix this at all in the back branches, the patch will
need to look quite different.

Report: https://postgr.es/m/20161205143037.4377.60754@wrigleys.postgresql.org
Discussion: https://postgr.es/m/27429.1480968538@sss.pgh.pa.us
2016-12-08 11:40:02 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Tom Lane 41e2b84ce1 Fix bogus handling of JOIN_UNIQUE_OUTER/INNER cases for parallel joins.
consider_parallel_nestloop passed the wrong jointype down to its
subroutines for JOIN_UNIQUE_INNER cases (it should pass JOIN_INNER), and it
thought that it could pass paths other than innerrel->cheapest_total_path
to create_unique_path, which create_unique_path is not on board with.
These bugs would lead to assertion failures or other errors, suggesting
that this code path hasn't been tested much.

hash_inner_and_outer's code for parallel join effectively treated both
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER the same as JOIN_INNER (for
different reasons :-(), leading to incorrect plans that treated a semijoin
as if it were a plain join.

Michael Day submitted a test case demonstrating that hash_inner_and_outer
failed for JOIN_UNIQUE_OUTER, and I found the other cases through code
review.

Report: https://postgr.es/m/D0E8A029-D1AC-42E8-979A-5DE4A77E4413@rcmail.com
2016-11-29 19:32:35 -05:00
Tom Lane d6c8b34e95 Fix incorrect variable type in set_rel_consider_parallel().
func_parallel() returns char not Oid.  Harmless, but still wrong.

Amit Langote
2016-11-29 11:07:02 -05:00
Tom Lane 4e20511d5b Fix estimate_expression_value to constant-fold SQLValueFunction nodes.
Oversight in my commit 0bb51aa96.  Noted while poking at a recent
bug report --- HEAD's estimates for a query using CURRENT_DATE
were unexpectedly much worse than 9.6's.
2016-11-28 19:08:45 -05:00
Alvaro Herrera eb68141688 Fix get_relation_info name typo'ed in a comment
Plus add a missing comment about this in get_relation_info itself.

Author: Amit Langote
Discussion: https://postgr.es/m/e46c0569-0449-afa0-e2fe-f3776e4b3fd5@lab.ntt.co.jp
2016-11-28 15:56:00 -03:00
Tom Lane ab77a5a456 Mark a query's topmost Paths parallel-unsafe if they will have initPlans.
Andreas Seltenreich found another case where we were being too optimistic
about allowing a plan to be considered parallelizable despite it containing
initPlans.  It seems like the real issue here is that if we know we are
going to tack initPlans onto the topmost Plan node for a subquery, we
had better mark that subquery's result Paths as not-parallel-safe.  That
fixes this problem and allows reversion of a kluge (added in commit
7b67a0a49 and extended in f24cf960d) to not trust the parallel_safe flag
at top level.

Discussion: <874m2w4k5d.fsf@ex.ansel.ydns.eu>
2016-11-25 16:20:12 -05:00
Tom Lane 6fa391be4e Avoid masking a function parameter name with a local variable name.
No actual bug here, but it might confuse readers, so change the name
of the local variable.

Ashutosh Bapat
2016-11-23 16:26:40 -05:00
Tom Lane 4324ade9a6 Fix optimization for skipping searches for parallel-query hazards.
Fix thinko in commit da1c91631: even if the original query was free of
parallel hazards, we might introduce such a hazard by adding PARAM_EXEC
Param nodes.  Adjust is_parallel_safe() so that it will scan the given
expression whenever any such nodes have been created.  Per report from
Andreas Seltenreich.

Discussion: <878tse6yvf.fsf@credativ.de>
2016-11-21 13:19:23 -05:00
Tom Lane f24cf960d7 Fix test for subplans in force-parallel mode.
We mustn't force parallel mode if the query has any subplans, since
ExecSerializePlan doesn't transmit them to workers.  Testing
top_plan->initPlan is inadequate because (1) there might be initPlans
attached to lower plan nodes, and (2) non-initPlan subplans don't
work either.  There's certainly room for improvement in those
restrictions, but for the moment that's what we've got.

Amit Kapila, per report from Andreas Seltenreich

Discussion: <8737im6pmh.fsf@credativ.de>
2016-11-21 11:09:24 -05:00
Tom Lane 0832f2db68 Fix latent costing error in create_merge_append_path.
create_merge_append_path should use the path rowcount it just computed,
not rel->tuples, for costing purposes.  Those numbers should always be
the same at present, but if we ever support parameterized MergeAppend
paths (a case this function is otherwise prepared for), the former would
be right and the latter wrong.

No need for back-patch since the problem is only latent.

Ashutosh Bapat

Discussion: <CAFjFpRek+cLCnTo24youuGtsq4zRphEB8EUUPjDxZjnL4n4HYQ@mail.gmail.com>
2016-11-19 15:06:45 -05:00
Tom Lane 24aef33804 Cleanup of rewriter and planner handling of Query.hasRowSecurity flag.
Be sure to pull up the subquery's hasRowSecurity flag when flattening a
subquery in pull_up_simple_subquery().  This isn't a bug today because
we don't look at the hasRowSecurity flag during planning, but it could
easily be a bug tomorrow.

Likewise, make rewriteRuleAction() pull up the hasRowSecurity flag when
absorbing RTEs from a rule action.  This isn't a bug either, for the
opposite reason: the flag should never be set yet.  But again, it seems
like good future proofing.

Add a comment explaining why rewriteTargetView() should *not* set
hasRowSecurity when adding stuff to securityQuals.

Improve some nearby comments about securityQuals processing, and document
that field more completely in parsenodes.h.

Patch by me, analysis by Dean Rasheed.

Discussion: <CAEZATCXZ8tb2DV6f=bkhsMV6u_gRcZ0CZBw2J-qU84RxSukZog@mail.gmail.com>
2016-11-10 16:16:33 -05:00
Tom Lane e1b449bea9 Fix partial aggregation for the case of a degenerate GROUP BY clause.
The plan generated for sorted partial aggregation with "GROUP BY constant"
included a Sort node with no sort keys, which the executor does not like.

Per report from Steve Randall.  I'd add a regression test case if I could
think of a compact one, but it doesn't seem worth expending lots of cycles
on.

Report: <CABVd52UAdGXpg_rCk46egpNKYdXOzCjuJ1zG26E2xBe_8bj+Fg@mail.gmail.com>
2016-11-10 11:31:56 -05:00
Tom Lane 34ca090570 Adjust cost_merge_append() to reflect use of binaryheap_replace_first().
Commit 7a2fe9bd0 improved merge append so that replacement of a tuple
takes log(N) operations, not twice log(N).  Since cost_merge_append knew
about that explicitly, we should adjust it.  This probably makes little
difference in practice, but the obsolete comment is confusing.

Ideally this would have been put in in 9.3 with the underlying behavior
change; but I'm not going to back-patch it, since there's some small chance
of changing a plan choice that somebody's optimized for.

Thomas Munro

Discussion: <CAEepm=0WQBSvuYcMOUj4Ga4NXpu2J=ejZcE=e=eiTjTX-6_gDw@mail.gmail.com>
2016-11-05 13:48:11 -04:00
Tom Lane 770671062f Don't make FK-based selectivity estimates in inheritance situations.
The foreign-key-aware logic for estimation of join sizes (added in commit
100340e2d) blindly tried to apply the concept to rels that are actually
parents of inheritance trees.  This is just plain wrong so far as the
referenced relation is concerned, since the inheritance scan may well
produce lots of rows that are not participating in the constraint.  It's
wrong for the referencing relation too, for the same reason; although on
that end we could conceivably detect whether all members of the inheritance
tree have equivalent FK constraints pointing to the same referenced rel,
and then proceed more or less as we do now.  But pending somebody writing
code to do that, we must disable this, because it's producing completely
silly estimates when there's an FK linking the heads of inheritance trees.

Per bug #14404 from Clinton Adams.  Back-patch to 9.6 where the new
estimation logic came in.

Report: <20161028200412.15987.96482@wrigleys.postgresql.org>
2016-11-02 15:50:15 -04:00
Tom Lane da8f3ebf30 Don't convert Consts into Vars during setrefs.c processing.
While converting expressions in an upper-level plan node so that they
reference Vars and expressions provided by the input plan node(s),
don't convert plain Const items, even if there happens to be a matching
Const in the input.  It's silly to do so because a Var is more expensive to
execute than a Const.  Moreover, converting can fool ExecCheckPlanOutput's
check that an insert or update query inserts nulls into dropped columns,
leading to "query provides a value for a dropped column" errors during
INSERT or UPDATE on a table with a dropped column.  We could solve this
by making that check more complicated, but I don't see the point; this fix
should save a marginal number of cycles, and it also makes for less messy
EXPLAIN output, as shown by the ensuing regression test result changes.

Per report from Pavel Hanák.  I have not incorporated a test case based
on that example, as there doesn't seem to be a simple way of checking
this in isolation without making a bunch of assumptions about other
planner and SQL-function behavior.

Back-patch to 9.6.  This setrefs.c behavior exists much further back,
but there is not currently reason to think that it causes problems
before 9.6.

Discussion: <83shraampf.fsf@is-it.eu>
2016-11-02 14:32:13 -04:00
Tom Lane 9a00f03e47 Improve speed of aggregates that use array_append as transition function.
In the previous coding, if an aggregate's transition function returned an
expanded array, nodeAgg.c and nodeWindowAgg.c would always copy it and thus
force it into the flat representation.  This led to ping-ponging between
flat and expanded formats, which costs a lot.  For an aggregate using
array_append as transition function, I measured about a 15X slowdown
compared to the pre-9.5 code, when working on simple int[] arrays.
Of course, the old code was already O(N^2) in this usage due to copying
flat arrays all the time, but it wasn't quite this inefficient.

To fix, teach nodeAgg.c and nodeWindowAgg.c to allow expanded transition
values without copying, so long as the transition function takes care to
return the transition value already properly parented under the aggcontext.
That puts a bit of extra responsibility on the transition function, but
doing it this way allows us to not need any extra logic in the fast path
of advance_transition_function (ie, with a pass-by-value transition value,
or with a modified-in-place pass-by-reference value).  We already know
that that's a hot spot so I'm loath to add any cycles at all there.  Also,
while only array_append currently knows how to follow this convention,
this solution allows other transition functions to opt-in without needing
to have a whitelist in the core aggregation code.

(The reason we would need a whitelist is that currently, if you pass a
R/W expanded-object pointer to an arbitrary function, it's allowed to do
anything with it including deleting it; that breaks the core agg code's
assumption that it should free discarded values.  Returning a value under
aggcontext is the transition function's signal that it knows it is an
aggregate transition function and will play nice.  Possibly the API rules
for expanded objects should be refined, but that would not be a
back-patchable change.)

With this fix, an aggregate using array_append is no longer O(N^2), so it's
much faster than pre-9.5 code rather than much slower.  It's still a bit
slower than the bespoke infrastructure for array_agg, but the differential
seems to be only about 10%-20% rather than orders of magnitude.

Discussion: <6315.1477677885@sss.pgh.pa.us>
2016-10-30 12:27:41 -04:00
Robert Haas 7012b132d0 postgres_fdw: Push down aggregates to remote servers.
Now that the upper planner uses paths, and now that we have proper hooks
to inject paths into the upper planning process, it's possible for
foreign data wrappers to arrange to push aggregates to the remote side
instead of fetching all of the rows and aggregating them locally.  This
figures to be a massive win for performance, so teach postgres_fdw to
do it.

Jeevan Chalke and Ashutosh Bapat.  Reviewed by Ashutosh Bapat with
additional testing by Prabhat Sahu.  Various mostly cosmetic changes
by me.
2016-10-21 09:54:29 -04:00
Andres Freund 5dfc198146 Use more efficient hashtable for execGrouping.c to speed up hash aggregation.
The more efficient hashtable speeds up hash-aggregations with more than
a few hundred groups significantly. Improvements of over 120% have been
measured.

Due to the the different hash table queries that not fully
determined (e.g. GROUP BY without ORDER BY) may change their result
order.

The conversion is largely straight-forward, except that, due to the
static element types of simplehash.h type hashes, the additional data
some users store in elements (e.g. the per-group working data for hash
aggregaters) is now stored in TupleHashEntryData->additional.  The
meaning of BuildTupleHashTable's entrysize (renamed to additionalsize)
has been changed to only be about the additionally stored size.  That
size is only used for the initial sizing of the hash-table.

Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
2016-10-14 17:22:51 -07:00
Tom Lane 72daabc7a3 Disallow pushing volatile quals past set-returning functions.
Pushing an upper-level restriction clause into an unflattened
subquery-in-FROM is okay when the subquery contains no SRFs in its
targetlist, or when it does but the SRFs are unreferenced by the clause
*and the clause is not volatile*.  Otherwise, we're changing the number
of times the clause is evaluated, which is bad for volatile quals, and
possibly changing the result, since a volatile qual might succeed for some
SRF output rows and not others despite not referencing any of the changing
columns.  (Indeed, if the clause is something like "random() > 0.5", the
user is probably expecting exactly that behavior.)

We had most of these restrictions down, but not the one about the upper
clause not being volatile.  Fix that, and add a regression test to
illustrate the expected behavior.

Although this is definitely a bug, it doesn't seem like back-patch
material, since possibly some users don't realize that the broken
behavior is broken and are relying on what happens now.  Also, while
the added test is quite cheap in the wake of commit a4c35ea1c, it would
be much more expensive (or else messier) in older branches.

Per report from Tom van Tilburg.

Discussion: <CAP3PPDiucxYCNev52=YPVkrQAPVF1C5PFWnrQPT7iMzO1fiKFQ@mail.gmail.com>
2016-09-27 18:43:36 -04:00
Tom Lane a4c35ea1c2 Improve parser's and planner's handling of set-returning functions.
Teach the parser to reject misplaced set-returning functions during parse
analysis using p_expr_kind, in much the same way as we do for aggregates
and window functions (cf commit eaccfded9).  While this isn't complete
(it misses nesting-based restrictions), it's much better than the previous
error reporting for such cases, and it allows elimination of assorted
ad-hoc expression_returns_set() error checks.  We could add nesting checks
later if it seems important to catch all cases at parse time.

There is one case the parser will now throw error for although previous
versions allowed it, which is SRFs in the tlist of an UPDATE.  That never
behaved sensibly (since it's ill-defined which generated row should be
used to perform the update) and it's hard to see why it should not be
treated as an error.  It's a release-note-worthy change though.

Also, add a new Query field hasTargetSRFs reporting whether there are
any SRFs in the targetlist (including GROUP BY/ORDER BY expressions).
The parser can now set that basically for free during parse analysis,
and we can use it in a number of places to avoid expression_returns_set
searches.  (There will be more such checks soon.)  In some places, this
allows decontorting the logic since it's no longer expensive to check for
SRFs in the tlist --- so I made the checks parallel to the handling of
hasAggs/hasWindowFuncs wherever it seemed appropriate.

catversion bump because adding a Query field changes stored rules.

Andres Freund and Tom Lane

Discussion: <24639.1473782855@sss.pgh.pa.us>
2016-09-13 13:54:24 -04:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane 2c00fad286 Fix improper repetition of previous results from a hashed aggregate.
ExecReScanAgg's check for whether it could re-use a previously calculated
hashtable neglected the possibility that the Agg node might reference
PARAM_EXEC Params that are not referenced by its input plan node.  That's
okay if the Params are in upper tlist or qual expressions; but if one
appears in aggregate input expressions, then the hashtable contents need
to be recomputed when the Param's value changes.

To avoid unnecessary performance degradation in the case of a Param that
isn't within an aggregate input, add logic to the planner to determine
which Params are within aggregate inputs.  This requires a new field in
struct Agg, but fortunately we never write plans to disk, so this isn't
an initdb-forcing change.

Per report from Jeevan Chalke.  This has been broken since forever,
so back-patch to all supported branches.

Andrew Gierth, with minor adjustments by me

Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>
2016-08-24 14:38:12 -04:00
Tom Lane 65a603e903 Guard against parallel-restricted functions in VALUES expressions.
Obvious brain fade in set_rel_consider_parallel().  Noticed it while
adjusting the adjacent RTE_FUNCTION case.

In 9.6, also make the code look more like what I just did in HEAD
by removing the unnecessary function_rte_parallel_ok subroutine
(it does nothing that expression_tree_walker wouldn't do).
2016-08-19 14:35:32 -04:00
Tom Lane da1c91631e Speed up planner's scanning for parallel-query hazards.
We need to scan the whole parse tree for parallel-unsafe functions.
If there are none, we'll later need to determine whether particular
subtrees contain any parallel-restricted functions.  The previous coding
retained no knowledge from the first scan, even though this is very
wasteful in the common case where the query contains only parallel-safe
functions.  We can bypass all of the later scans by remembering that fact.
This provides a small but measurable speed improvement when the case
applies, and shouldn't cost anything when it doesn't.

Patch by me, reviewed by Robert Haas

Discussion: <3740.1471538387@sss.pgh.pa.us>
2016-08-19 14:03:13 -04:00
Tom Lane 0bb51aa967 Improve parsetree representation of special functions such as CURRENT_DATE.
We implement a dozen or so parameterless functions that the SQL standard
defines special syntax for.  Up to now, that was done by converting them
into more or less ad-hoc constructs such as "'now'::text::date".  That's
messy for multiple reasons: it exposes what should be implementation
details to users, and performance is worse than it needs to be in several
cases.  To improve matters, invent a new expression node type
SQLValueFunction that can represent any of these parameterless functions.

Bump catversion because this changes stored parsetrees for rules.

Discussion: <30058.1463091294@sss.pgh.pa.us>
2016-08-16 20:33:01 -04:00
Tom Lane f0c7b789ab Fix two errors with nested CASE/WHEN constructs.
ExecEvalCase() tried to save a cycle or two by passing
&econtext->caseValue_isNull as the isNull argument to its sub-evaluation of
the CASE value expression.  If that subexpression itself contained a CASE,
then *isNull was an alias for econtext->caseValue_isNull within the
recursive call of ExecEvalCase(), leading to confusion about whether the
inner call's caseValue was null or not.  In the worst case this could lead
to a core dump due to dereferencing a null pointer.  Fix by not assigning
to the global variable until control comes back from the subexpression.
Also, avoid using the passed-in isNull pointer transiently for evaluation
of WHEN expressions.  (Either one of these changes would have been
sufficient to fix the known misbehavior, but it's clear now that each of
these choices was in itself dangerous coding practice and best avoided.
There do not seem to be any similar hazards elsewhere in execQual.c.)

Also, it was possible for inlining of a SQL function that implements the
equality operator used for a CASE comparison to result in one CASE
expression's CaseTestExpr node being inserted inside another CASE
expression.  This would certainly result in wrong answers since the
improperly nested CaseTestExpr would be caused to return the inner CASE's
comparison value not the outer's.  If the CASE values were of different
data types, a crash might result; moreover such situations could be abused
to allow disclosure of portions of server memory.  To fix, teach
inline_function to check for "bare" CaseTestExpr nodes in the arguments of
a function to be inlined, and avoid inlining if there are any.

Heikki Linnakangas, Michael Paquier, Tom Lane

Report: https://github.com/greenplum-db/gpdb/pull/327
Report: <4DDCEEB8.50602@enterprisedb.com>
Security: CVE-2016-5423
2016-08-08 10:33:46 -04:00
Tom Lane 9492cf86e4 Fix assorted fallout from IS [NOT] NULL patch.
Commits 4452000f3 et al established semantics for NullTest.argisrow that
are a bit different from its initial conception: rather than being merely
a cache of whether we've determined the input to have composite type,
the flag now has the further meaning that we should apply field-by-field
testing as per the standard's definition of IS [NOT] NULL.  If argisrow
is false and yet the input has composite type, the construct instead has
the semantics of IS [NOT] DISTINCT FROM NULL.  Update the comments in
primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print
such cases correctly.  In the case of ruleutils.c, this merely results in
cosmetic changes in EXPLAIN output, since the case can't currently arise
in stored rules.  However, it represents a live bug for deparse.c, which
would formerly have sent a remote query that had semantics different
from the local behavior.  (From the user's standpoint, this means that
testing a remote nested-composite column for null-ness could have had
unexpected recursive behavior much like that fixed in 4452000f3.)

In a related but somewhat independent fix, make plancat.c set argisrow
to false in all NullTest expressions constructed to represent "attnotnull"
constructs.  Since attnotnull is actually enforced as a simple null-value
check, this is a more accurate representation of the semantics; we were
previously overpromising what it meant for composite columns, which might
possibly lead to incorrect planner optimizations.  (It seems that what the
SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so
arguably we are violating the spec and should fix attnotnull to do the
other thing.  If we ever do, this part should get reverted.)

Back-patch, same as the previous commit.

Discussion: <10682.1469566308@sss.pgh.pa.us>
2016-07-28 16:09:15 -04:00
Tom Lane 69995c3b3f Fix cost_rescan() to account for multi-batch hashing correctly.
cost_rescan assumed that we don't need to rebuild the hash table when
rescanning a hash join.  However, that's currently only true for
single-batch joins; for a multi-batch join we must charge full freight.

This probably has escaped notice because we'd be unlikely to put a hash
join on the inside of a nestloop anyway.  Nonetheless, it's wrong.
Fix in HEAD, but don't backpatch for fear of destabilizing plans in
stable releases.
2016-07-27 17:45:05 -04:00
Tom Lane 4452000f31 Fix constant-folding of ROW(...) IS [NOT] NULL with composite fields.
The SQL standard appears to specify that IS [NOT] NULL's tests of field
nullness are non-recursive, ie, we shouldn't consider that a composite
field with value ROW(NULL,NULL) is null for this purpose.
ExecEvalNullTest got this right, but eval_const_expressions did not,
leading to weird inconsistencies depending on whether the expression
was such that the planner could apply constant folding.

Also, adjust the docs to mention that IS [NOT] DISTINCT FROM NULL can be
used as a substitute test if a simple null check is wanted for a rowtype
argument.  That motivated reordering things so that IS [NOT] DISTINCT FROM
is described before IS [NOT] NULL.  In HEAD, I went a bit further and added
a table showing all the comparison-related predicates.

Per bug #14235.  Back-patch to all supported branches, since it's certainly
undesirable that constant-folding should change the semantics.

Report and patch by Andrew Gierth; assorted wordsmithing and revised
regression test cases by me.

Report: <20160708024746.1410.57282@wrigleys.postgresql.org>
2016-07-26 15:25:02 -04:00
Tom Lane 6d85bb1ba7 Correctly set up aggregate FILTER expression in partial-aggregation plans.
The aggfilter expression should be removed from the parent (combining)
Aggref, since it's not supposed to apply the filter, and indeed cannot
because any Vars used in the filter would not be available after the
lower-level aggregation step.  Per report from Jeff Janes.

(This has been broken since the introduction of partial aggregation,
I think.  The error became obvious after commit 59a3795c2, when setrefs.c
began processing the parent Aggref's fields normally and thus would detect
such Vars.  The special-case coding previously used in setrefs.c skipped
over the parent's aggfilter field without processing it.  That was broken
in its own way because no other setrefs.c processing got applied either;
though since the executor would not execute the filter expression, only
initialize it, that oversight might not have had any visible symptoms at
present.)

Report: <CAMkU=1xfuPf2edAe4ZGXTmJpU7jxuKukKyvNtEXwu35B7dvejg@mail.gmail.com>
2016-07-23 20:16:48 -04:00
Tom Lane 45639a0525 Avoid invalidating all foreign-join cached plans when user mappings change.
We must not push down a foreign join when the foreign tables involved
should be accessed under different user mappings.  Previously we tried
to enforce that rule literally during planning, but that meant that the
resulting plans were dependent on the current contents of the
pg_user_mapping catalog, and we had to blow away all cached plans
containing any remote join when anything at all changed in pg_user_mapping.
This could have been improved somewhat, but the fact that a syscache inval
callback has very limited info about what changed made it hard to do better
within that design.  Instead, let's change the planner to not consider user
mappings per se, but to allow a foreign join if both RTEs have the same
checkAsUser value.  If they do, then they necessarily will use the same
user mapping at runtime, and we don't need to know specifically which one
that is.  Post-plan-time changes in pg_user_mapping no longer require any
plan invalidation.

This rule does give up some optimization ability, to wit where two foreign
table references come from views with different owners or one's from a view
and one's directly in the query, but nonetheless the same user mapping
would have applied.  We'll sacrifice the first case, but to not regress
more than we have to in the second case, allow a foreign join involving
both zero and nonzero checkAsUser values if the nonzero one is the same as
the prevailing effective userID.  In that case, mark the plan as only
runnable by that userID.

The plancache code already had a notion of plans being userID-specific,
in order to support RLS.  It was a little confused though, in particular
lacking clarity of thought as to whether it was the rewritten query or just
the finished plan that's dependent on the userID.  Rearrange that code so
that it's clearer what depends on which, and so that the same logic applies
to both RLS-injected role dependency and foreign-join-injected role
dependency.

Note that this patch doesn't remove the other issue mentioned in the
original complaint, which is that while we'll reliably stop using a foreign
join if it's disallowed in a new context, we might fail to start using a
foreign join if it's now allowed, but we previously created a generic
cached plan that didn't use one.  It was agreed that the chance of winning
that way was not high enough to justify the much larger number of plan
invalidations that would have to occur if we tried to cause it to happen.

In passing, clean up randomly-varying spelling of EXPLAIN commands in
postgres_fdw.sql, and fix a COSTS ON example that had been allowed to
leak into the committed tests.

This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the
previous attempt at ensuring we wouldn't push down foreign joins that
span permissions contexts.

Etsuro Fujita and Tom Lane

Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
2016-07-15 17:23:02 -04:00
Peter Eisentraut 63cfdb8dde Adjust spellings of forms of "cancel" 2016-07-14 22:48:26 -04:00
Tom Lane cec5501394 Add a regression test case to improve code coverage for tuplesort.
Test the external-sort code path in CLUSTER for two different scenarios:
multiple-pass external sorting, and the best case for replacement
selection, where only one run is produced, so that no merge is required.
This test would have caught the bug fixed in commit 1b0fc8507, at
least when run with valgrind enabled.

In passing, add a short-circuit test in plan_cluster_use_sort() to make
dead certain that it selects sorting when enable_indexscan is off.  As
things stand, that would happen anyway, but it seems like good future
proofing for this test.

Peter Geoghegan

Discussion: <CAM3SWZSgxehDkDMq1FdiW2A0Dxc79wH0hz1x-TnGy=1BXEL+nw@mail.gmail.com>
2016-07-13 15:23:56 -04:00
Tom Lane 29a2195de6 Typo fix. 2016-07-03 18:43:43 -04:00
Tom Lane 110a6dbdeb Allow RTE_SUBQUERY rels to be considered parallel-safe.
There isn't really any reason not to; the original comments here were
partly confused about subplans versus subquery-in-FROM, and partly
dependent on restrictions that no longer apply now that subqueries return
Paths not Plans.  Depending on what's inside the subquery, it might fail
to produce any parallel_safe Paths, but that's fine.

Tom Lane and Robert Haas
2016-07-03 18:24:49 -04:00
Tom Lane 4ea9948e58 Fix up parallel-safety marking for appendrels.
The previous coding assumed that the value derived by
set_rel_consider_parallel() for an appendrel parent would be accurate for
all the appendrel's children; but this is not so, for example because one
child might scan a temp table.  Instead, apply set_rel_consider_parallel()
to each child rel as well as the parent, and then take the AND of the
results as controlling parallel safety for the appendrel as a whole.

(We might someday be able to deal more intelligently than this with cases
in which some of the childrels are parallel-safe and others not, but that's
for later.)

Robert Haas and Tom Lane
2016-07-03 17:57:28 -04:00
Tom Lane 2c6e6471af Allow treating TABLESAMPLE scans as parallel-safe.
This was the intention all along, but an extraneous "return;" in
set_rel_consider_parallel() caused sampled rels to never be marked
consider_parallel.

Since we don't have any partial tablesample path/plan type yet, there's
no possibility of parallelizing the sample scan itself; but this fix
allows such a scan to appear below a parallel join, for example.
2016-07-03 16:55:27 -04:00
Tom Lane 0e495c5e2f Set correct cost data in Gather node added by force_parallel_mode.
We were just leaving the cost fields zeroes, which produces obviously bogus
output with force_parallel_mode = on.  With force_parallel_mode = regress,
the zeroes are hidden, but I wonder if they wouldn't still confuse add-on
code such as auto_explain.
2016-07-03 15:35:29 -04:00
Tom Lane c89d507649 Round rowcount estimate for a partial path to an integer.
I'd been wondering why I was sometimes seeing fractional rowcount
estimates in parallel-query situations, and this seems to be the
reason.  (You won't see the fractional parts in EXPLAIN, because it
prints rowcounts with %.0f, but they are apparent in the debugger.)
A fractional rowcount is not any saner for a partial path than any
other kind of path, and it's equally likely to break cost estimation
for higher paths, so apply clamp_row_est() like we do in other places.
2016-07-03 14:53:46 -04:00
Tom Lane 420c166163 Fix failure to mark all aggregates with appropriate transtype.
In commit 915b703e1 I gave get_agg_clause_costs() the responsibility of
marking Aggref nodes with the appropriate aggtranstype.  I failed to notice
that where it was being called from, it might see only a subset of the
Aggref nodes that were in the original targetlist.  Specifically, if there
are duplicate aggregate calls in the tlist, either make_sort_input_target
or make_window_input_target might put just a single instance into the
grouping_target, and then only that instance would get marked.  Fix by
moving the call back into grouping_planner(), before we start building
assorted PathTargets from the query tlist.  Per report from Stefan Huehner.

Report: <20160702131056.GD3165@huehner.biz>
2016-07-02 13:23:12 -04:00
Tom Lane 7b67a0a49c Fix some interrelated planner issues with initPlans and Param munging.
In commit 68fa28f77 I tried to teach SS_finalize_plan() to cope with
initPlans attached anywhere in the plan tree, by dint of moving its
handling of those into the recursion in finalize_plan().  It turns out that
that doesn't really work: if a lower-level plan node emits an initPlan
output parameter in its targetlist, it's legitimate for upper levels to
reference those Params --- and at the point where this code runs, those
references look just like the Param itself, so finalize_plan() quite
properly rejects them as being in the wrong place.  We could lobotomize
the checks enough to allow that, probably, but then it's not clear that
we'd have any meaningful check for misplaced Params at all.  What seems
better, at least in the near term, is to tweak standard_planner() a bit
so that initPlans are never placed anywhere but the topmost plan node
for a query level, restoring the behavior that occurred pre-9.6.  Possibly
we can do better if this code is ever merged into setrefs.c: then it would
be possible to check a Param's placement only when we'd failed to replace
it with a Var referencing a child plan node's targetlist.

BTW, I'm now suspicious that finalize_plan is doing the wrong thing by
returning the node's allParam rather than extParam to be incorporated
in the parent node's set of used parameters.  However, it makes no
difference given that initPlans only appear at top level, so I'll leave
that alone for now.

Another thing that emerged from this is that standard_planner() needs
to check for initPlans before deciding that it's safe to stick a Gather
node on top in force_parallel_mode mode.  We previously guarded against
that by deciding the plan wasn't wholePlanParallelSafe if any subplans
had been found, but after commit 5ce5e4a12 it's necessary to have this
substitute test, because path parallel_safe markings don't account for
initPlans.  (Normally, we'd have decided the paths weren't safe anyway
due to appearances of SubPlan nodes, Params, or CTE scans somewhere in
the tree --- but it's possible for those all to be optimized away while
initPlans still remain.)

Per fuzz testing by Andreas Seltenreich.

Report: <874m89rw7x.fsf@credativ.de>
2016-07-01 20:06:05 -04:00
Tom Lane 9e703987a8 Rethink the GetForeignUpperPaths API (again).
In the previous design, the GetForeignUpperPaths FDW callback hook was
called before we got around to labeling upper relations with the proper
consider_parallel flag; this meant that any upper paths created by an FDW
would be marked not-parallel-safe.  While that's probably just as well
right now, we aren't going to want it to be true forever.  Hence, abandon
the idea that FDWs should be allowed to inject upper paths before the core
code has gotten around to creating the relevant upper relation.  (Well,
actually they still can, but it's on their own heads how well it works.)
Instead, adopt the same API already designed for create_upper_paths_hook:
we call GetForeignUpperPaths after each upperrel has been created and
populated with the paths the core planner knows how to make.
2016-07-01 13:12:34 -04:00
Robert Haas 5ce5e4a12e Set consider_parallel correctly for upper planner rels.
Commit 3fc6e2d7f5 introduced new "upper"
RelOptInfo structures but didn't set consider_parallel for them
correctly, a point I completely missed when reviewing it.  Later,
commit e06a38965b made the situation
worse by doing it incorrectly for the grouping relation.  Try to
straighten all of that out.  Along the way, get rid of the annoying
wholePlanParallelSafe flag, which was only necessarily because of
the fact that upper planning stages didn't use paths at the time
that code was written.

The most important immediate impact of these changes is that
force_parallel_mode will provide useful test coverage in quite a few
more scenarios than it did previously, but it's also necessary
preparation for fixing some problems related to subqueries.

Patch by me, reviewed by Tom Lane.
2016-07-01 11:52:56 -04:00
Tom Lane 3154e16737 Dodge compiler bug in Visual Studio 2013.
VS2013 apparently has a problem with taking the address of a formal
parameter in some cases.  We do that elsewhere without trouble, but
in this case the address is being passed to a subroutine that will
probably get inlined, so maybe the combination of those things is
what tickles the bug.  Anyway, introducing an extra copy of the
parameter value is enough to work around it.  Per trouble report
from Umair Shahid.

Report: <CAM184AcjqKYZSdQqBHDrnENXHhW=mXbUC46QYPJ=nAh0gUHCGA@mail.gmail.com>
2016-06-29 19:07:19 -04:00
Tom Lane b32e63506c Fix match_foreign_keys_to_quals for FKs linking to unused rtable entries.
Since get_relation_foreign_keys doesn't try to determine whether RTEs
are actually part of the query semantics, it might make FK info records
linking to RTEs that won't have a RelOptInfo at all.  Cope with that.
Per bug #14219 from Andrew Gierth.

Report: <20160629183338.1397.43514@wrigleys.postgresql.org>
2016-06-29 16:02:08 -04:00
Tom Lane c12f02ffc9 Don't apply sortgroupref labels to a tlist that might not match.
If we need to use a gating Result node for pseudoconstant quals,
create_scan_plan() intentionally suppresses use_physical_tlist's checks
on whether there are matches for sortgroupref labels, on the grounds that
we don't need matches because we can label the Result's projection output
properly.  However, it then called apply_pathtarget_labeling_to_tlist
anyway.  This oversight was harmless when written, but in commit aeb9ae645
I made that function throw an error if there was no match.  Thus, the
combination of a table scan, pseudoconstant quals, and a non-simple-Var
sortgroupref column threw the dreaded "ORDER/GROUP BY expression not found
in targetlist" error.  To fix, just skip applying the labeling in this
case.  Per report from Rushabh Lathia.

Report: <CAGPqQf2iLB8t6t-XrL-zR233DFTXxEsfVZ4WSqaYfLupEoDxXA@mail.gmail.com>
2016-06-28 10:43:11 -04:00
Tom Lane f1993038a4 Avoid making a separate pass over the query to check for partializability.
It's rather silly to make a separate pass over the tlist + HAVING qual,
and a separate set of visits to the syscache, when get_agg_clause_costs
already has all the required information in hand.  This nets out as less
code as well as fewer cycles.
2016-06-26 15:55:01 -04:00
Tom Lane 19e972d558 Rethink node-level representation of partial-aggregation modes.
The original coding had three separate booleans representing partial
aggregation behavior, which was confusing, unreadable, and error-prone,
not least because the booleans weren't always listed in the same order.
It was also inadequate for the allegedly-desirable future extension to
support intermediate partial aggregation, because we'd need separate
markers for serialization and deserialization in such a case.

Merge these bools into an enum "AggSplit" to provide symbolic names for
the supported operating modes (and document what those are).  By assigning
the values of the enum constants carefully, we can treat AggSplit values
as options bitmasks so that tests of what to do aren't noticeably more
expensive than before.

While at it, get rid of Aggref.aggoutputtype.  That's not needed since
commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison
code, and it likewise seemed more confusing than helpful.

Assorted comment cleanup as well (there's still more that I want to do
in that line).

catversion bump for change in Aggref node contents.  Should be the last
one for partial-aggregation changes.

Discussion: <29309.1466699160@sss.pgh.pa.us>
2016-06-26 14:33:38 -04:00
Tom Lane 59a3795c25 Simplify planner's final setup of Aggrefs for partial aggregation.
Commit e06a38965's original coding for constructing the execution-time
expression tree for a combining aggregate was rather messy, involving
duplicating quite a lot of code in setrefs.c so that it could inject
a nonstandard matching rule for Aggrefs.  Get rid of that in favor of
explicitly constructing a combining Aggref with a partial Aggref as input,
then allowing setref's normal matching logic to match the partial Aggref
to the output of the lower plan node and hence replace it with a Var.

In passing, rename and redocument make_partialgroup_input_target to have
some connection to what it actually does.
2016-06-26 12:08:12 -04:00
Tom Lane f8ace5477e Fix type-safety problem with parallel aggregate serial/deserialization.
The original specification for this called for the deserialization function
to have signature "deserialize(serialtype) returns transtype", which is a
security violation if transtype is INTERNAL (which it always would be in
practice) and serialtype is not (which ditto).  The patch blithely overrode
the opr_sanity check for that, which was sloppy-enough work in itself,
but the indisputable reason this cannot be allowed to stand is that CREATE
FUNCTION will reject such a signature and thus it'd be impossible for
extensions to create parallelizable aggregates.

The minimum fix to make the signature type-safe is to add a second, dummy
argument of type INTERNAL.  But to lock it down a bit more and make misuse
of INTERNAL-accepting functions less likely, let's get rid of the ability
to specify a "serialtype" for an aggregate and just say that the only
useful serialtype is BYTEA --- which, in practice, is the only interesting
value anyway, due to the usefulness of the send/recv infrastructure for
this purpose.  That means we only have to allow "serialize(internal)
returns bytea" and "deserialize(bytea, internal) returns internal" as
the signatures for these support functions.

In passing fix bogus signature of int4_avg_combine, which I found thanks
to adding an opr_sanity check on combinefunc signatures.

catversion bump due to removing pg_aggregate.aggserialtype and adjusting
signatures of assorted built-in functions.

David Rowley and Tom Lane

Discussion: <27247.1466185504@sss.pgh.pa.us>
2016-06-22 16:52:41 -04:00
Tom Lane 8b9d323cb9 Refactor planning of projection steps that don't need a Result plan node.
The original upper-planner-pathification design (commit 3fc6e2d7f5)
assumed that we could always determine during Path formation whether or not
we would need a Result plan node to perform projection of a targetlist.
That turns out not to work very well, though, because createplan.c still
has some responsibilities for choosing the specific target list associated
with sorting/grouping nodes (in particular it might choose to add resjunk
columns for sorting).  We might not ever refactor that --- doing so would
push more work into Path formation, which isn't attractive --- and we
certainly won't do so for 9.6.  So, while create_projection_path and
apply_projection_to_path can tell for sure what will happen if the subpath
is projection-capable, they can't tell for sure when it isn't.  This is at
least a latent bug in apply_projection_to_path, which might think it can
apply a target to a non-projecting node when the node will end up computing
something different.

Also, I'd tied the creation of a ProjectionPath node to whether or not a
Result is needed, but it turns out that we sometimes need a ProjectionPath
node anyway to avoid modifying a possibly-shared subpath node.  Callers had
to use create_projection_path for such cases, and we added code to them
that knew about the potential omission of a Result node and attempted to
adjust the cost estimates for that.  That was uncertainly correct and
definitely ugly/unmaintainable.

To fix, have create_projection_path explicitly check whether a Result
is needed and adjust its cost estimate accordingly, though it creates
a ProjectionPath in either case.  apply_projection_to_path is now mostly
just an optimized version that can avoid creating an extra Path node when
the input is known to not be shared with any other live path.  (There
is one case that create_projection_path doesn't handle, which is pushing
parallel-safe expressions below a Gather node.  We could make it do that
by duplicating the GatherPath, but there seems no need as yet.)

create_projection_plan still has to recheck the tlist-match condition,
which means that if the matching situation does get changed by createplan.c
then we'll have made a slightly incorrect cost estimate.  But there seems
no help for that in the near term, and I doubt it occurs often enough,
let alone would change planning decisions often enough, to be worth
stressing about.

I added a "dummypp" field to ProjectionPath to track whether
create_projection_path thinks a Result is needed.  This is not really
necessary as-committed because create_projection_plan doesn't look at the
flag; but it seems like a good idea to remember what we thought when
forming the cost estimate, if only for debugging purposes.

In passing, get rid of the target_parallel parameter added to
apply_projection_to_path by commit 54f5c5150.  I don't think that's a good
idea because it involves callers in what should be an internal decision,
and opens us up to missing optimization opportunities if callers think they
don't need to provide a valid flag, as most don't.  For the moment, this
just costs us an extra has_parallel_hazard call when planning a Gather.
If that starts to look expensive, I think a better solution would be to
teach PathTarget to carry/cache knowledge of parallel-safety of its
contents.
2016-06-21 18:38:20 -04:00
Tom Lane 100340e2dc Restore foreign-key-aware estimation of join relation sizes.
This patch provides a new implementation of the logic added by commit
137805f89 and later removed by 77ba61080.  It differs from the original
primarily in expending much less effort per joinrel in large queries,
which it accomplishes by doing most of the matching work once per query not
once per joinrel.  Hopefully, it's also less buggy and better commented.
The never-documented enable_fkey_estimates GUC remains gone.

There remains work to be done to make the selectivity estimates account
for nulls in FK referencing columns; but that was true of the original
patch as well.  We may be able to address this point later in beta.
In the meantime, any error should be in the direction of overestimating
rather than underestimating joinrel sizes, which seems like the direction
we want to err in.

Tomas Vondra and Tom Lane

Discussion: <31041.1465069446@sss.pgh.pa.us>
2016-06-18 15:22:34 -04:00
Tom Lane 598aa194af Still another try at fixing scanjoin_target insertion into parallel plans.
The previous code neglected the fact that the scanjoin_target might
carry sortgroupref labelings that we need to absorb.  Instead, do
create_projection_path() unconditionally, and tweak the path's cost
estimate after the fact.  (I'm now convinced that we ought to refactor
the way we account for sometimes not needing a separate projection step,
but right now is not the time for that sort of cleanup.)

Problem identified by Amit Kapila, patch by me.
2016-06-18 00:28:51 -04:00
Tom Lane 915b703e16 Fix handling of argument and result datatypes for partial aggregation.
When doing partial aggregation, the args list of the upper (combining)
Aggref node is replaced by a Var representing the output of the partial
aggregation steps, which has either the aggregate's transition data type
or a serialized representation of that.  However, nodeAgg.c blindly
continued to use the args list as an indication of the user-level argument
types.  This broke resolution of polymorphic transition datatypes at
executor startup (though it accidentally failed to fail for the ANYARRAY
case, which is likely the only one anyone had tested).  Moreover, the
constructed FuncExpr passed to the finalfunc contained completely wrong
information, which would have led to bogus answers or crashes for any case
where the finalfunc examined that information (which is only likely to be
with polymorphic aggregates using a non-polymorphic transition type).

As an independent bug, apply_partialaggref_adjustment neglected to resolve
a polymorphic transition datatype before assigning it as the output type
of the lower-level Aggref node.  This again accidentally failed to fail
for ANYARRAY but would be unlikely to work in other cases.

To fix the first problem, record the user-level argument types in a
separate OID-list field of Aggref, and look to that rather than the args
list when asking what the argument types were.  (It turns out to be
convenient to include any "direct" arguments in this list too, although
those are not currently subject to being overwritten.)

Rather than adding yet another resolve_aggregate_transtype() call to fix
the second problem, add an aggtranstype field to Aggref, and store the
resolved transition type OID there when the planner first computes it.
(By doing this in the planner and not the parser, we can allow the
aggregate's transition type to change from time to time, although no DDL
support yet exists for that.)  This saves nothing of consequence for
simple non-polymorphic aggregates, but for polymorphic transition types
we save a catalog lookup during executor startup as well as several
planner lookups that are new in 9.6 due to parallel query planning.

In passing, fix an error that was introduced into count_agg_clauses_walker
some time ago: it was applying exprTypmod() to something that wasn't an
expression node at all, but a TargetEntry.  exprTypmod silently returned
-1 so that there was not an obvious failure, but this broke the intended
sensitivity of aggregate space consumption estimates to the typmod of
varchar and similar data types.  This part needs to be back-patched.

Catversion bump due to change of stored Aggref nodes.

Discussion: <8229.1466109074@sss.pgh.pa.us>
2016-06-17 21:44:37 -04:00
Robert Haas 54f5c5150f Try again to fix the way the scanjoin_target is used with partial paths.
Commit 04ae11f62e removed some broken
code to apply the scan/join target to partial paths, but its theory
that this processing step is totally unnecessary turns out to be wrong.
Put similar code back again, but this time, check for parallel-safety
and avoid in-place modifications to paths that may already have been
used as part of some other path.

(This is not an entirely elegant solution to this problem; it might
be better, for example, to postpone generate_gather_paths for the
topmost scan/join rel until after the scan/join target has been
applied.  But this is not the time for such redesign work.)

Amit Kapila and Robert Haas
2016-06-17 16:29:07 -04:00
Tom Lane 75be66464c Invent min_parallel_relation_size GUC to replace a hard-wired constant.
The main point of doing this is to allow the cutoff to be set very small,
even zero, to allow parallel-query behavior to be tested on relatively
small tables such as we typically use in the regression tests.  But it
might be of use to users too.  The number-of-workers scaling behavior in
create_plain_partial_paths() is pretty ad-hoc and subject to change, so
we won't expose anything about that, but the notion of not considering
parallel query at all for tables below size X seems reasonably stable.

Amit Kapila, per a suggestion from me

Discussion: <17170.1465830165@sss.pgh.pa.us>
2016-06-16 13:47:20 -04:00
Tom Lane 89d53515e5 In planner.c, avoid assuming that all PathTargets have sortgrouprefs.
The struct definition for PathTarget specifies that a NULL sortgrouprefs
pointer means no sortgroupref labels.  While it's likely that there
should always be at least one labeled column in the places that were
unconditionally fetching through the pointer, it seems wiser to adhere to
the data structure specification and test first.  Add a macro to make this
convenient.  Per experimentation with running the regression tests with a
very small parallelization threshold --- the crash I observed may well
represent a bug elsewhere, but still this coding was not very robust.

Report: <20756.1465834072@sss.pgh.pa.us>
2016-06-13 12:59:25 -04:00
Tom Lane 3303ea1a32 Remove reltarget_has_non_vars flag.
Commit b12fd41c6 added a "reltarget_has_non_vars" field to RelOptInfo,
but failed to maintain it accurately.  Since its only purpose was to skip
calls to has_parallel_hazard() in the simple case where a rel's targetlist
is all Vars, and that call is really pretty cheap in that case anyway, it
seems like this is just a case of premature optimization.  Let's drop the
flag and do the calls unconditionally until it's proven that we need more
smarts here.
2016-06-10 16:20:03 -04:00
Tom Lane 2f153ddfdd Refactor to reduce code duplication for function property checking.
As noted by Andres Freund, we'd accumulated quite a few similar functions
in clauses.c that examine all functions in an expression tree to see if
they satisfy some boolean test.  Reduce the duplication by inventing a
function check_functions_in_node() that applies a simple callback function
to each SQL function OID appearing in a given expression node.  This also
fixes some arguable oversights; for example, contain_mutable_functions()
did not check aggregate or window functions for mutability.  I doubt that
that represents a live bug at the moment, because we don't really consider
mutability for aggregates; but it might someday be one.

I chose to put check_functions_in_node() in nodeFuncs.c because it seemed
like other modules might wish to use it in future.  That in turn forced
moving set_opfuncid() et al into nodeFuncs.c, as the alternative was for
nodeFuncs.c to depend on optimizer/setrefs.c which didn't seem very clean.

In passing, teach contain_leaked_vars_walker() about a few more expression
node types it can safely look through, and improve the rather messy and
undercommented code in has_parallel_hazard_walker().

Discussion: <20160527185853.ziol2os2zskahl7v@alap3.anarazel.de>
2016-06-10 16:03:46 -04:00
Tom Lane cae1c788b9 Improve the situation for parallel query versus temp relations.
Transmit the leader's temp-namespace state to workers.  This is important
because without it, the workers do not really have the same search path
as the leader.  For example, there is no good reason (and no extant code
either) to prevent a worker from executing a temp function that the
leader created previously; but as things stood it would fail to find the
temp function, and then either fail or execute the wrong function entirely.

We still prohibit a worker from creating a temp namespace on its own.
In effect, a worker can only see the session's temp namespace if the leader
had created it before starting the worker, which seems like the right
semantics.

Also, transmit the leader's BackendId to workers, and arrange for workers
to use that when determining the physical file path of a temp relation
belonging to their session.  While the original intent was to prevent such
accesses entirely, there were a number of holes in that, notably in places
like dbsize.c which assume they can safely access temp rels of other
sessions anyway.  We might as well get this right, as a small down payment
on someday allowing workers to access the leader's temp tables.  (With
this change, directly using "MyBackendId" as a relation or buffer backend
ID is deprecated; you should use BackendIdForTempRelations() instead.
I left a couple of such uses alone though, as they're not going to be
reachable in parallel workers until we do something about localbuf.c.)

Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down
into localbuf.c, which is where it actually matters, instead of having it
in relation_open().  This amounts to recognizing that access to temp
tables' catalog entries is perfectly safe in a worker, it's only the data
in local buffers that is problematic.

Having done all that, we can get rid of the test in has_parallel_hazard()
that says that use of a temp table's rowtype is unsafe in parallel workers.
That test was unduly expensive, and if we really did need such a
prohibition, that was not even close to being a bulletproof guard for it.
(For example, any user-defined function executed in a parallel worker
might have attempted such access.)
2016-06-09 20:16:11 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas b12fd41c69 Don't generate parallel paths for rels with parallel-restricted outputs.
Such paths are unsafe.  To make it cheaper to detect when this case
applies, track whether a relation's default PathTarget contains any
non-Vars.  In most cases, the answer will be no, which enables us to
determine cheaply that the target list for a proposed path is
parallel-safe.  However, subquery pull-up can create cases that
require us to inspect the target list more carefully.

Amit Kapila, reviewed by me.
2016-06-09 12:43:36 -04:00
Tom Lane e4158319f3 Mop-up for parallel degree-ectomy.
Fix a couple of overlooked uses of "degree" terminology.  Make the parallel
worker count selection logic in create_plain_partial_paths more robust (in
particular, it failed with max_parallel_workers_per_gather set to zero).
2016-06-09 11:16:26 -04:00
Robert Haas c9ce4a1c61 Eliminate "parallel degree" terminology.
This terminology provoked widespread complaints.  So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers.  Rename structure members to match.

These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
2016-06-09 10:00:26 -04:00
Tom Lane 77ba610805 Revert "Use Foreign Key relationships to infer multi-column join selectivity".
This commit reverts 137805f89 as well as the associated commits 015e88942,
5306df283, and 68d704edb.  We found multiple bugs in this feature, and
there was concern about possible planner slowdown (though to be fair,
exhibiting a very large slowdown proved difficult).  The way forward
requires a considerable rewrite, which may or may not be possible to
accomplish in time for beta2.  In my judgment reviewing the rewrite will
be easier to accomplish starting from a clean slate, so let's temporarily
revert what's there now.  This also leaves us in a safe state if it turns
out to be necessary to postpone the rewrite to the next development cycle.

Discussion: <20160429102531.GA13701@huehner.biz>
2016-06-07 17:21:17 -04:00
Robert Haas 04ae11f62e Remove bogus code to apply PathTargets to partial paths.
The partial paths that get modified may already have been used as
part of a GatherPath which appears in the path list, so modifying
them is not a good idea at this stage - especially because this
code has no check that the PathTarget is in fact parallel-safe.

When partial aggregation is being performed, this is actually
harmless because we'll end up replacing the pathtargets here with
the correct ones within create_grouping_paths().  But if we've got
a query tree containing only scan/join operations then this can
result in incorrectly pushing down parallel-restricted target
list entries.  If those are, for example, references to subqueries,
that can crash the server; but it's wrong in any event.

Amit Kapila
2016-06-03 14:27:33 -04:00
Greg Stark e1623c3959 Fix various common mispellings.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.

Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
2016-06-03 16:08:45 +01:00
Tom Lane aeb9ae6457 Disable physical tlist if any Var would need multiple sortgroupref labels.
As part of upper planner pathification (commit 3fc6e2d7f5) I redid
createplan.c's approach to the physical-tlist optimization, in which scan
nodes are allowed to return exactly the underlying table's columns so as
to save doing a projection step at runtime.  The logic was intentionally
more aggressive than before about applying the optimization, which is
generally a good thing, but Andres Freund found a case in which it got
too aggressive.  Namely, if any column is referenced more than once in
the parent plan node's sorting or grouping column list, we can't optimize
because then that column would need to have more than one ressortgroupref
label, and we only have space for one.

Add logic to detect this situation in use_physical_tlist(), and also add
some error checking in apply_pathtarget_labeling_to_tlist(), which this
example proves was being overly cavalier about whether what it was doing
made any sense.

The added test case exposes the problem only because we do not eliminate
duplicate grouping keys.  That might be something to fix someday, but it
doesn't seem like appropriate post-beta work.

Report: <20160526021235.w4nq7k3gnheg7vit@alap3.anarazel.de>
2016-05-26 14:52:30 -04:00
Tom Lane 8a13d5e6d1 Fix infer_arbiter_indexes() to not barf on system columns.
While it could be argued that rejecting system column mentions in the
ON CONFLICT list is an unsupported feature, falling over altogether
just because the table has a unique index on OID is indubitably a bug.

As far as I can tell, fixing infer_arbiter_indexes() is sufficient to
make ON CONFLICT (oid) actually work, though making a regression test
for that case is problematic because of the impossibility of setting
the OID counter to a known value.

Minor cosmetic cleanups along with the bug fix.
2016-05-11 17:06:53 -04:00
Tom Lane 26e66184d6 Fix assorted missing infrastructure for ON CONFLICT.
subquery_planner() failed to apply expression preprocessing to the
arbiterElems and arbiterWhere fields of an OnConflictExpr.  No doubt the
theory was that this wasn't necessary because we don't actually try to
execute those expressions; but that's wrong, because it results in failure
to match to index expressions or index predicates that are changed at all
by preprocessing.  Per bug #14132 from Reynold Smith.

Also add pullup_replace_vars processing for onConflictWhere.  Perhaps
it's impossible to have a subquery reference there, but I'm not exactly
convinced; and even if true today it's a failure waiting to happen.

Also add some comments to other places where one or another field of
OnConflictExpr is intentionally ignored, with explanation as to why it's
okay to do so.

Also, catalog/dependency.c failed to record any dependency on the named
constraint in ON CONFLICT ON CONSTRAINT, allowing such a constraint to
be dropped while rules exist that depend on it, and allowing pg_dump to
dump such a rule before the constraint it refers to.  The normal execution
path managed to error out reasonably for a dangling constraint reference,
but ruleutils.c dumped core; so in addition to fixing the omission, add
a protective check in ruleutils.c, since we can't retroactively add a
dependency in existing databases.

Back-patch to 9.5 where this code was introduced.

Report: <20160510190350.2608.48667@wrigleys.postgresql.org>
2016-05-11 16:20:23 -04:00
Robert Haas 68d704edbf Minimal fix for crash bug in quals_match_foreign_key.
Discussion is still underway as to whether to revert the entire patch
that added this function, but that discussion may not conclude before
beta1.  So, in the meantime, let's do at least this much.

David Rowley
2016-05-06 15:00:55 -04:00
Tom Lane 2a2435e699 Small improvements to OPTIMIZER_DEBUG code.
Now that Paths have their own rows field, print that rather than
the parent relation's rowcount.

Show the relid sets associated with Paths using table names rather
than numbers; since this code is able to print simple Var references
using table names, it seems a bit silly that print_relids can't.

Print the cheapest_parameterized_paths list for a RelOptInfo, and
include information about a parameterized path's required_outer rels.

Noted while trying to use this feature to debug Alexander Kirkouski's
recent bug report.
2016-04-30 14:08:00 -04:00
Tom Lane c45bf5751b Fix planner crash from pfree'ing a partial path that a GatherPath uses.
We mustn't run generate_gather_paths() during add_paths_to_joinrel(),
because that function can be invoked multiple times for the same target
joinrel.  Not only is it wasteful to build GatherPaths repeatedly, but
a later add_partial_path() could delete the partial path that a previously
created GatherPath depends on.  Instead establish the convention that we
do generate_gather_paths() for a rel only just before set_cheapest().

The code was accidentally not broken for baserels, because as of today there
never is more than one partial path for a baserel.  But that assumption
obviously has a pretty short half-life, so move the generate_gather_paths()
calls for those cases as well.

Also add some generic comments explaining how and why this all works.

Per fuzz testing by Andreas Seltenreich.

Report: <871t5pgwdt.fsf@credativ.de>
2016-04-30 12:29:21 -04:00
Tom Lane 207d5a656e Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like

	Nested Loop
	  ->  Scan X
	  ->  Nested Loop
	    ->  Scan Y
	    ->  Scan Z
	          Filter: Z.Z = X.X

The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node.  On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members.  So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there.  If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.

This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape.  (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.)  Many
thanks to Alexander Kirkouski for submitting a reproducible test case.

The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-29 20:19:38 -04:00
Robert Haas 59eb551279 Fix EXPLAIN VERBOSE output for parallel aggregate.
The way that PartialAggregate and FinalizeAggregate plan nodes were
displaying output columns before was bogus.  Now, FinalizeAggregate
produces the same outputs as an Aggregate would have produced, while
PartialAggregate produces each of those outputs prefixed by the word
PARTIAL.

Discussion: 12585.1460737650@sss.pgh.pa.us

Patch by me, reviewed by David Rowley.
2016-04-27 07:37:40 -04:00
Robert Haas 77cd477c4b Enable parallel query by default.
Change max_parallel_degree default from 0 to 2.  It is possible that
this is not a good idea, or that we should go with 1 worker rather
than 2, but we won't find out without trying it.  Along the way,
reword the documentation for max_parallel_degree a little bit to
hopefully make it more clear.

Discussion: 20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de
2016-04-26 08:35:58 -04:00
Tom Lane 80f66a9ad0 Fix planner failure with full join in RHS of left join.
Given a left join containing a full join in its righthand side, with
the left join's joinclause referencing only one side of the full join
(in a non-strict fashion, so that the full join doesn't get simplified),
the planner could fail with "failed to build any N-way joins" or related
errors.  This happened because the full join was seen as overlapping the
left join's RHS, and then recent changes within join_is_legal() caused
that function to conclude that the full join couldn't validly be formed.
Rather than try to rejigger join_is_legal() yet more to allow this,
I think it's better to fix initsplan.c so that the required join order
is explicit in the SpecialJoinInfo data structure.  The previous coding
there essentially ignored full joins, relying on the fact that we don't
flatten them in the joinlist data structure to preserve their ordering.
That's sufficient to prevent a wrong plan from being formed, but as this
example shows, it's not sufficient to ensure that the right plan will
be formed.  We need to work a bit harder to ensure that the right plan
looks sane according to the SpecialJoinInfos.

Per bug #14105 from Vojtech Rylko.  This was apparently induced by
commit 8703059c6 (though now that I've seen it, I wonder whether there
are related cases that could have failed before that); so back-patch
to all active branches.  Unfortunately, that patch also went into 9.0,
so this bug is a regression that won't be fixed in that branch.
2016-04-21 20:05:58 -04:00
Robert Haas 36f69faeff Comment improvements for ForeignPath.
It's not necessarily just scanning a base relation any more.

Amit Langote and Etsuro Fujita
2016-04-21 13:30:48 -04:00
Robert Haas 9c75e1a36b Forbid parallel Hash Right Join or Hash Full Join.
That won't work.  You'll get bogus null-extended rows.

Mithun Cy
2016-04-20 17:48:55 -04:00
Magnus Hagander ba8fe38f58 Fix typo in comment 2016-04-15 13:32:54 +02:00
Robert Haas deb71fa971 Fix costing for parallel aggregation.
The original patch kind of ignored the fact that we were doing something
different from a costing point of view, but nobody noticed.  This patch
fixes that oversight.

David Rowley
2016-04-12 16:25:55 -04:00
Tom Lane f1f01de145 Redefine create_upper_paths_hook as being invoked once per upper relation.
Per discussion, this gives potential users of the hook more flexibility,
because they can build custom Paths that implement only one stage of
upper processing atop core-provided Paths for earlier stages.
2016-04-12 15:23:14 -04:00
Tom Lane 5306df2831 Clean up foreign-key caching code in planner.
Coverity complained that the code added by 015e88942a lacked an
error check for SearchSysCache1 failures, which it should have.  But
the code was pretty duff in other ways too, including failure to think
about whether it could really cope with arrays of different lengths.
2016-04-10 23:47:30 -04:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Robert Haas 25fe8b5f1a Add a 'parallel_degree' reloption.
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.

Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me.  Some further hacking by me.
2016-04-08 11:14:56 -04:00
Robert Haas 0711803775 Use quicksort, not replacement selection, for external sorting.
We still use replacement selection for the first run of the sort only
and only when the number of tuples is relatively small.  Otherwise,
the first run, and subsequent runs in all cases, are produced using
quicksort.  This tends to be faster except perhaps for very small
amounts of working memory.

Peter Geoghegan, reviewed by Tomas Vondra, Jeff Janes, Mithun Cy,
Greg Stark, and me.
2016-04-08 02:36:26 -04:00
Simon Riggs 137805f89a Use Foreign Key relationships to infer multi-column join selectivity
In cases where joins use multiple columns we currently assess each join
separately causing gross mis-estimates for join cardinality.

This patch adds use of FK information for the first time into the
planner. When FKs are present and we have multi-column join information,
plan estimates will be drastically improved. Cases with multiple FKs
are handled, though partial matches are ignored currently.

Net effect is substantial performance improvements for joins in many
common cases. Additional planning time is isolated to cases that are
currently performing poorly, measured at 0.08 - 0.15 ms.

Please watch for planner performance regressions; circumstances seem
unlikely but the law of unintended consequences may apply somewhen.
Additional complex tests welcome to prove this before release.

Tests can be performed using SET enable_fkey_estimates = on | off
using scripts provided during Hackers discussions, message id:
552335D9.3090707@2ndquadrant.com

Authors: Tomas Vondra and David Rowley
Reviewed and tested by Simon Riggs, adding comments only
2016-04-08 02:51:09 +01:00
Tom Lane f338dd7585 Refactor join_is_removable() to separate out distinctness-proving logic.
Extracted from pending unique-join patch, since this is a rather large
delta but it's simply moving code out into separately-accessible
subroutines.

I (tgl) did choose to add a bit more logic to rel_supports_distinctness,
so that it verifies that there's at least one potentially usable unique
index rather than just checking indexlist != NIL.  Otherwise there's
no functional change here.

David Rowley
2016-04-07 13:12:31 -04:00
Simon Riggs 015e88942a Load FK defs into relcache for use by planner
Fastpath ignores this if no triggers defined.

Author: Tomas Vondra, with fastpath and comments added by me
Reviewers: David Rowley, Simon Riggs
2016-04-07 12:08:33 +01:00
Tom Lane de94e2af18 Run pgindent on a batch of (mostly-planner-related) source files.
Getting annoyed at the amount of unrelated chatter I get from pgindent'ing
Rowley's unique-joins patch.  Re-indent all the files it touches.
2016-04-06 11:34:02 -04:00
Robert Haas 41ea0c2376 Fix parallel-safety code for parallel aggregation.
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good.  Fix that.  There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.

Patch by me, reviewed by David Rowley.
2016-04-05 16:06:15 -04:00
Tom Lane f9aefcb91f Support using index-only scans with partial indexes in more cases.
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate.  This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all.  So index-only scans were
essentially unavailable with such partial indexes.

To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner.  This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes.  However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes.  We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed.  We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.

Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me
2016-03-31 14:49:10 -04:00
Robert Haas 5fe5a2cee9 Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal".  Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.
2016-03-29 15:04:05 -04:00
Robert Haas f9143d102f Rework custom scans to work more like the new extensible node stuff.
Per discussion, the new extensible node framework is thought to be
better designed than the custom path/scan/scanstate stuff we added
in PostgreSQL 9.5.  Rework the latter to be more like the former.

This is not backward-compatible, but we generally don't promise that
for C APIs, and there probably aren't many people using this yet
anyway.

KaiGai Kohei, reviewed by Petr Jelinek and me.  Some further
cosmetic changes by me.
2016-03-29 11:28:04 -04:00
Robert Haas 5d4171d1c7 Don't require a user mapping for FDWs to work.
Commit fbe5a3fb73 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.

Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.
2016-03-28 21:50:28 -04:00
Tom Lane 76281aa964 Avoid a couple of zero-divide scenarios in the planner.
cost_subplan() supposed that the given subplan must have plan_rows > 0,
which as far as I can tell was true until recent refactoring of the
code in createplan.c; but now that code allows the Result for a provably
empty subquery to have plan_rows = 0.  Rather than undo that change,
put in a clamp to prevent zero divide.

get_cheapest_fractional_path() likewise supposed that best_path->rows > 0.
This assumption has been wrong for longer.  It's actually harmless given
IEEE float math, because a positive value divided by zero gives +Infinity
and compare_fractional_path_costs() will do the right thing with that.
Still, best not to assume that.

final_cost_nestloop() also seems to have some risks in this area, so
borrow the clamping logic already present in the mergejoin cost functions.

Lastly, remove unnecessary clamp_row_est() in planner.c's calls to
get_number_of_groups().  The only thing that function does with path_rows
is pass it to estimate_num_groups() which already has an internal clamp,
so we don't need the extra call; and if we did, the callers are arguably
the wrong place for it anyway.

First two items reported by Piotr Stefaniak, the others are products
of my nosing around for similar problems.  No back-patch since there's
no evidence that problems arise in the back branches.
2016-03-26 12:03:12 -04:00
Tom Lane d543170f2f Don't split up SRFs when choosing to postpone SELECT output expressions.
In commit 9118d03a8c we taught the planner to postpone evaluation of
set-returning functions in a SELECT's targetlist until after any sort done
to satisfy ORDER BY.  However, if we postpone some SRFs this way while
others do not get postponed (because they're sort or group key columns)
we will break the traditional behavior by which all SRFs in the tlist run
in-step during ExecTargetList(), so that you get the least common multiple
of their periods not the product.  Fix make_sort_input_target() so it will
not split up SRF evaluation in such cases.

There is still a hazard of similar odd behavior if there's a SRF in a
grouping column and another one that isn't, but that was true before
and we're just trying to preserve bug-compatibility with the traditional
behavior.  This whole area is overdue to be rethought and reimplemented,
but we'll try to avoid changing behavior until then.

Per report from Regina Obe.
2016-03-25 11:19:51 -04:00
Robert Haas e06a38965b Support parallel aggregation.
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.

David Rowley, based on earlier work by Haribabu Kommi.  Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.
2016-03-21 09:30:18 -04:00
Robert Haas 0bf3ae88af Directly modify foreign tables.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.

Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
2016-03-18 13:55:52 -04:00
Robert Haas 992b5ba30d Push scan/join target list beneath Gather when possible.
This means that, for example, "SELECT expensive_func(a) FROM bigtab
WHERE something" can compute expensive_func(a) in the workers rather
than the leader if it happens to be parallel-safe, which figures to be
a big win in some practical cases.

Currently, we can only do this if the entire target list is
parallel-safe.  If we worked harder, we might be able to evaluate
parallel-safe targets in the worker and any parallel-restricted
targets in the leader, but that would be more complicated, and there
aren't that many parallel-restricted functions that people are likely
to use in queries anyway.  I think.  So just do the simple thing for
the moment.

Robert Haas, Amit Kapila, and Tom Lane
2016-03-18 09:50:05 -04:00
Robert Haas 0218e8b3fa Fix typos.
Jim Nasby
2016-03-17 07:26:20 -04:00
Robert Haas 3aff33aa68 Fix typos.
Oskari Saarenmaa
2016-03-15 18:06:11 -04:00
Tom Lane 101fd9349e Add a GetForeignUpperPaths callback function for FDWs.
This is basically like the just-added create_upper_paths_hook, but
control is funneled only to the FDW responsible for all the baserels
of the current query; so providing such a callback is much less likely
to add useless overhead than using the hook function is.

The documentation is a bit sketchy.  We'll likely want to improve it,
and/or adjust the call conventions, when we get some experience with
actually using this callback.  Hopefully somebody will find time to
experiment with it before 9.6 feature freeze.
2016-03-14 20:04:48 -04:00
Tom Lane 5864d6a4b6 Provide a planner hook at a suitable place for creating upper-rel Paths.
In the initial revision of the upper-planner pathification work, the only
available way for an FDW or custom-scan provider to inject Paths
representing post-scan-join processing was to insert them during scan-level
GetForeignPaths or similar processing.  While that's not impossible, it'd
require quite a lot of duplicative processing to look forward and see if
the extension would be capable of implementing the whole query.  To improve
matters for custom-scan providers, provide a hook function at the point
where the core code is about to start filling in upperrel Paths.  At this
point Paths are available for the whole scan/join tree, which should reduce
the amount of redundant effort considerably.

(An alternative design that was suggested was to provide a separate hook
for each post-scan-join processing step, but that seems messy and not
clearly more useful.)

Following our time-honored tradition, there's no documentation for this
hook outside the source code.

As-is, this hook is only meant for custom scan providers, which we can't
assume very much about.  A followon patch will implement an FDW callback
to let FDWs do the same thing in a somewhat more structured fashion.
2016-03-14 19:23:29 -04:00
Tom Lane 28048cbaa2 Allow callers of create_foreignscan_path to specify nondefault PathTarget.
Although the default choice of rel->reltarget should typically be
sufficient for scan or join paths, it's not at all sufficient for the
purposes PathTargets were invented for; in particular not for
upper-relation Paths.  So break API compatibility by adding a PathTarget
argument to create_foreignscan_path().  To ease updating of existing
code, accept a NULL value of the argument as selecting rel->reltarget.
2016-03-14 17:31:28 -04:00
Tom Lane 307c78852f Rethink representation of PathTargets.
In commit 19a541143a I did not make PathTarget a subtype of Node,
and embedded a RelOptInfo's reltarget directly into it rather than having
a separately-allocated Node.  In hindsight that was misguided
micro-optimization, enabled by the fact that at that point we didn't have
any Paths with custom PathTargets.  Now that PathTarget processing has
been fleshed out some more, it's easier to see that it's better to have
PathTarget as an indepedent Node type, even if it does cost us one more
palloc to create a RelOptInfo.  So change it while we still can.

This commit just changes the representation, without doing anything more
interesting than that.
2016-03-14 16:59:59 -04:00
Robert Haas 6be84eeb8d Update more comments for 96198d94cb.
Etsuro Fujita, reviewed (though not completely endorsed) by Ashutosh
Bapat, and slightly expanded by me.
2016-03-14 14:29:12 -04:00
Tom Lane 570be1f73f Re-export a few of createplan.c's make_xxx() functions.
CitusDB is using these and don't wish to redesign their code right now.
I am not on board with this being a good idea, or a good precedent,
but I lack the energy to fight about it.
2016-03-12 12:12:59 -05:00
Tom Lane 9118d03a8c When appropriate, postpone SELECT output expressions till after ORDER BY.
It is frequently useful for volatile, set-returning, or expensive functions
in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
done.  Otherwise, the functions might be executed for every row of the
table despite the presence of LIMIT, and/or be executed in an unexpected
order.  For example, in
	SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
it's probably desirable that the nextval() values are ordered the same
as x, and that nextval() is not run more than 10 times.

In the past, Postgres was inconsistent in this area: you would get the
desirable behavior if the ordering were performed via an indexscan, but
not if it had to be done by an explicit sort step.  Getting the desired
behavior reliably required contortions like
	SELECT x, nextval('seq')
	  FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;

This patch conditionally postpones evaluation of pure-output target
expressions (that is, those that are not used as DISTINCT, ORDER BY, or
GROUP BY columns) so that they effectively occur after sorting, even if an
explicit sort step is necessary.  Volatile expressions and set-returning
expressions are always postponed, so as to provide consistent semantics.
Expensive expressions (costing more than 10 times typical operator cost,
which by default would include any user-defined function) are postponed
if there is a LIMIT or if there are expressions that must be postponed.

We could be more aggressive and postpone any nontrivial expression, but
there are costs associated with doing so: it requires an extra Result plan
node which adds some overhead, and postponement changes the volume of data
going through the sort step, perhaps for the worse.  Since we tend not to
have very good estimates of the output width of nontrivial expressions,
it's hard to have much confidence in our ability to predict whether
postponement would increase or decrease the cost of the sort; therefore
this patch doesn't attempt to make decisions conditionally on that.
Between these factors and a general desire not to change query behavior
when there's not a demonstrable benefit, it seems best to be conservative
about applying postponement.  We might tweak the decision rules in the
future, though.

Konstantin Knizhnik, heavily rewritten by me
2016-03-11 12:27:50 -05:00
Tom Lane 49635d7b3e Minor additional refactoring of planner.c's PathTarget handling.
Teach make_group_input_target() and make_window_input_target() to work
entirely with the PathTarget representation of tlists, rather than
constructing a tlist and immediately deconstructing it into PathTarget
format.  In itself this only saves a few palloc's; the bigger picture is
that it opens the door for sharing cost_qual_eval work across all of
planner.c's constructions of PathTargets.  I'll come back to that later.

In support of this, flesh out tlist.c's infrastructure for PathTargets
a bit more.
2016-03-11 10:24:55 -05:00
Tom Lane c82c92b111 Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.
All along, this function should have treated WindowFuncs in a manner
similar to Aggrefs, ie with an option whether or not to recurse into them.
By not considering the case, it was always recursing, which is OK for most
callers (although I suspect that the case in prepare_sort_from_pathkeys
might represent a bug).  But now we need return-without-recursing behavior
as well.  There are also more than a few callers that should never see a
WindowFunc, and now we'll get some error checking on that.
2016-03-10 16:23:52 -05:00
Tom Lane 364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Tom Lane 8776c15c85 Fix incorrect tlist generation in create_gather_plan().
This function is written as though Gather doesn't project; but it does.
Even if it did not project, though, we must use build_path_tlist to ensure
that the output columns receive correct sortgroupref labeling.

Per report from Amit Kapila.
2016-03-09 10:56:46 -05:00
Tom Lane 51c0f63e4d Improve handling of pathtargets in planner.c.
Refactor so that the internal APIs in planner.c deal in PathTargets not
targetlists, and establish a more regular structure for deriving the
targets needed for successive steps.

There is more that could be done here; calculating the eval costs of each
successive target independently is both inefficient and wrong in detail,
since we won't actually recompute values available from the input node's
tlist.  But it's no worse than what happened before the pathification
rewrite.  In any case this seems like a good starting point for considering
how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
2016-03-09 01:12:16 -05:00
Tom Lane 9e8b99420f Improve handling of group-column indexes in GroupingSetsPath.
Instead of having planner.c compute a groupColIdx array and store it in
GroupingSetsPaths, make create_groupingsets_plan() find the grouping
columns by searching in the child plan node's tlist.  Although that's
probably a bit slower for create_groupingsets_plan(), it's more like
the way every other plan node type does this, and it provides positive
confirmation that we know which child output columns we're supposed to be
grouping on.  (Indeed, looking at this now, I'm not at all sure that it
wasn't broken before, because create_groupingsets_plan() isn't demanding
an exact tlist match from its child node.)  Also, this allows substantial
simplification in planner.c, because it no longer needs to compute the
groupColIdx array at all; no other cases were using it.

I'd intended to put off this refactoring until later (like 9.7), but
in view of the likely bug fix and the need to rationalize planner.c's
tlist handling so we can do something sane with Konstantin Knizhnik's
function-evaluation-postponement patch, I think it can't wait.
2016-03-08 22:32:11 -05:00
Tom Lane 61fd218930 Fix minor thinko in pathification code.
I passed the wrong "root" struct to create_pathtarget in build_minmax_path.
Since the subroot is a clone of the outer root, this would not cause any
serious problems, but it would waste some cycles because
set_pathtarget_cost_width would not have access to Var width estimates
set up while running query_planner on the subroot.
2016-03-08 16:50:44 -05:00
Tom Lane 8c314b9853 Finish refactoring make_foo() functions in createplan.c.
This patch removes some redundant cost calculations that I left for later
cleanup in commit 3fc6e2d7f5.  There's now a uniform policy that the
make_foo() convenience functions don't do any cost calculations.  Most of
their callers copy costs from the source Path node, and for those that
don't, the calculation in the make_foo() function wasn't necessarily right
anyhow.  (make_result() was particularly a mess, as it was serving multiple
callers using cost calcs designed for only the first one or two that had
ever existed.)  Aside from saving a few cycles, this ensures that what
EXPLAIN prints matches the costs we used for planning purposes.  It does
not change any planner decisions, since the decisions are already made.
2016-03-08 16:28:34 -05:00
Robert Haas 7400559a3f Comment update for fdw_recheck_quals.
Commit 5fc4c26db5 could've done a better
job updating these comments.

Etsuro Fujita
2016-03-08 14:40:55 -05:00
Tom Lane cf8e7b16a5 Spell "parallel" correctly.
Per David Rowley.
2016-03-07 21:48:17 -05:00
Tom Lane 3fc6e2d7f5 Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is.  This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps.  Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step.  We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.

In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan.  It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation.  (A couple of regression test outputs change in consequence of
that.  However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)

There is a great deal left to do here.  This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations.  (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.)  I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.

Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 15:58:22 -05:00
Tom Lane 05893712cc Fix build under OPTIMIZER_DEBUG.
In commit 19a541143a I replaced RelOptInfo.width with
RelOptInfo.reltarget.width, but I missed updating debug_print_rel()
for that because it's not compiled by default.
Reported by Salvador Fandino, patch by Michael Paquier.
2016-02-29 10:14:12 -05:00
Dean Rasheed 41fedc2462 Fix incorrect varlevelsup in security_barrier_replace_vars().
When converting an RTE with securityQuals into a security barrier
subquery RTE, ensure that the Vars in the new subquery's targetlist
all have varlevelsup = 0 so that they correctly refer to the
underlying base relation being wrapped.

The original code was creating new Vars by copying them from existing
Vars referencing the base relation found elsewhere in the query, but
failed to account for the fact that such Vars could come from sublink
subqueries, and hence have varlevelsup > 0. In practice it looks like
this could only happen with nested security barrier views, where the
outer view has a WHERE clause containing a correlated subquery, due to
the order in which the Vars are processed.

Bug: #13988
Reported-by: Adam Guthrie
Backpatch-to: 9.4, where updatable SB views were introduced
2016-02-29 12:28:06 +00:00
Robert Haas 35746bc348 Add new FDW API to test for parallel-safety.
This is basically a bug fix; the old code assumes that a ForeignScan
is always parallel-safe, but for postgres_fdw, for example, this is
definitely false.  It should be true for file_fdw, though, since a
worker can read a file from the filesystem just as well as any other
backend process.

Original patch by Thomas Munro.  Documentation, and changes to the
comments, by me.
2016-02-26 16:14:46 +05:30
Tom Lane 19a541143a Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist).  However, there are good
reasons to remove that assumption.  For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free".  While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node).  Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next.  Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit.  It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.

While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist.  To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)

I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation.  Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't.  Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths.  This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more.  This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-18 20:02:03 -05:00
Tom Lane d4c3a156cb Remove GROUP BY columns that are functionally dependent on other columns.
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.

This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses.  Thus, redundant GROUP BY items are not uncommon.  Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:34:59 -05:00
Tom Lane 2564be360a Fix typo in comment. 2016-02-11 15:20:14 -05:00
Andres Freund a6897efab9 Fix overeager pushdown of HAVING clauses when grouping sets are used.
In 61444bfb we started to allow HAVING clauses to be fully pushed down
into WHERE, even when grouping sets are in use. That turns out not to
work correctly, because grouping sets can "produce" NULLs, meaning that
filtering in WHERE and HAVING can have different results, even when no
aggregates or volatile functions are involved.

Instead only allow pushdown of empty grouping sets.

It'd be nice to do better, but the exact mechanics of deciding which
cases are safe are still being debated. It's important to give correct
results till we find a good solution, and such a solution might not be
appropriate for backpatching anyway.

Bug: #13863
Reported-By: 'wrb'
Diagnosed-By: Dean Rasheed
Author: Andrew Gierth
Reviewed-By: Dean Rasheed and Andres Freund
Discussion: 20160113183558.12989.56904@wrigleys.postgresql.org
Backpatch: 9.5, where grouping sets were introduced
2016-02-08 11:03:31 +01:00
Robert Haas 7c944bd903 Introduce a new GUC force_parallel_mode for testing purposes.
When force_parallel_mode = true, we enable the parallel mode restrictions
for all queries for which this is believed to be safe.  For the subset of
those queries believed to be safe to run entirely within a worker, we spin
up a worker and run the query there instead of running it in the
original process.  When force_parallel_mode = regress, make additional
changes to allow the regression tests to run cleanly even though parallel
workers have been injected under the hood.

Taken together, this facilitates both better user testing and better
regression testing of the parallelism code.

Robert Haas, with help from Amit Kapila and Rushabh Lathia.
2016-02-07 11:41:33 -05:00
Alvaro Herrera 3cb5867b7d Don't test for system columns on join relations
create_foreignscan_plan needs to know whether any system columns are
requested from a relation (this flag is needed by ForeignNext during
execution).  However, for join relations this is a pointless test,
because it's not possible to request system columns from them, so
remove the check.

Author: Etsuro Fujita
Discussion: http://www.postgresql.org/message-id/56AA0FC5.9000207@lab.ntt.co.jp
Reviewed-by: David Rowley, Robert Haas
2016-02-02 19:20:02 +01:00
Robert Haas fbe5a3fb73 Only try to push down foreign joins if the user mapping OIDs match.
Previously, the foreign join pushdown infrastructure left the question
of security entirely up to individual FDWs, but it would be easy for
a foreign data wrapper to inadvertently open up subtle security holes
that way.  So, make it the core code's job to determine which user
mapping OID is relevant, and don't attempt join pushdown unless it's
the same for all relevant relations.

Per a suggestion from Tom Lane.  Shigeru Hanada and Ashutosh Bapat,
reviewed by Etsuro Fujita and KaiGai Kohei, with some further
changes by me.
2016-01-28 14:05:36 -05:00
Robert Haas eaf7b1f643 Assert that create_unique_path returns non-NULL.
Per off-list discussion with Tom Lane and Michael Paquier, Coverity
gets unhappy if this is not done.
2016-01-27 22:03:18 -05:00
Tom Lane b99551832e Add defenses against putting expanded objects into Const nodes.
Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons.  It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime.  Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself.  This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.

I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.

The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems.  Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b665.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)

The test added in commit 65c3d05e18 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum().  That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.

Back-patch to 9.5 where the expanded-object facility was added.  It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.
2016-01-21 12:56:08 -05:00
Robert Haas 45be99f8cd Support parallel joins, and make related improvements.
The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.

This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.

Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.

There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.

Patch by me, reviewed by Dilip Kumar and Amit Kapila.
2016-01-20 14:40:26 -05:00
Robert Haas a7de3dc5c3 Support multi-stage aggregation.
Aggregate nodes now have two new modes: a "partial" mode where they
output the unfinalized transition state, and a "finalize" mode where
they accept unfinalized transition states rather than individual
values as input.

These new modes are not used anywhere yet, but they will be necessary
for parallel aggregation.  The infrastructure also figures to be
useful for cases where we want to aggregate local data and remote
data via the FDW interface, and want to bring back partial aggregates
from the remote side that can then be combined with locally generated
partial aggregates to produce the final value.  It may also be useful
even when neither FDWs nor parallelism are in play, as explained in
the comments in nodeAgg.c.

David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
Linnakangas, Haribabu Kommi, and me.
2016-01-20 13:46:50 -05:00
Tom Lane 49b4950650 Add explicit cast to amcostestimate call.
My compiler doesn't complain here, but David Rowley's does ...
2016-01-17 22:56:16 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Tom Lane 8d290c8ec6 Re-pgindent a few files.
In preparation for landing index AM interface changes.
2016-01-17 19:13:18 -05:00
Tom Lane a923af382c Fix build_grouping_chain() to not clobber its input lists.
There's no good reason for stomping on the input data; it makes the logic
in this function no simpler, in fact probably the reverse.  And it makes
it impossible to separate path generation from plan generation, as I'm
working towards doing; that will require more than one traversal of these
lists.
2016-01-14 11:51:57 -05:00
Robert Haas 950ab82c3d Remove obsolete comment.
Noted while reviewing a question from Dickson S. Guedes.
2016-01-10 21:35:33 -05:00
Tom Lane a54676acad Marginal cleanup of GROUPING SETS code in grouping_planner().
Improve comments and make it a shade less messy.  I think we might want
to move all of this somewhere else later, but it needs to be more
readable first.

In passing, re-pgindent the file, affecting some recently-added comments
concerning parallel query planning.
2016-01-07 20:32:35 -05:00
Tom Lane c44d013835 Delay creation of subplan tlist until after create_plan().
Once upon a time it was necessary for grouping_planner() to determine
the tlist it wanted from the scan/join plan subtree before it called
query_planner(), because query_planner() would actually make a Plan using
that.  But we refactored things a long time ago to delay construction of
the Plan tree till later, so there's no need to build that tlist until
(and indeed unless) we're ready to plaster it onto the Plan.  The only
thing query_planner() cares about is what Vars are going to be needed for
the tlist, and it can perfectly well get that by looking at the real tlist
rather than some masticated version.

Well, actually, there is one minor glitch in that argument, which is that
make_subplanTargetList also adds Vars appearing only in HAVING to the
tlist it produces.  So now we have to account for HAVING explicitly in
build_base_rel_tlists.  But that just adds a few lines of code, and
I doubt it moves the needle much on processing time; we might be doing
pull_var_clause() twice on the havingQual, but before we had it scanning
dummy tlist entries instead.

This is a very small down payment on rationalizing grouping_planner
enough so it can be refactored.
2016-01-07 20:23:57 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane efe4c9d704 Add some comments about division of labor between rewriter and planner.
The rationale for the way targetlist processing is done wasn't clearly
stated anywhere, and I for one had forgotten some of the details.  Having
just painfully re-learned them, add some breadcrumbs for the next person.
2015-12-29 18:50:35 -05:00
Robert Haas 51d152f18e Change Gather not to use a physical tlist.
This should have been part of the original commit, but was missed.
Pushing data between processes is expensive, so we definitely want
to project away unneeded columns here, just as we do for other nodes
like Sort and Hash that care about the volume of data.
2015-12-23 13:41:06 -05:00
Robert Haas ccd8f97922 postgres_fdw: Consider requesting sorted data so we can do a merge join.
When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join.  Commit f18c944b61
arranges to push down the query pathkeys, which seems like the case
mostly likely to be a win, but testing shows this can sometimes win,
too.

For a regular table, we know which indexes are present and therefore
test whether the ordering provided by each such index is useful.  Here,
we take the opposite approach: guess what orderings would be useful if
they could be generated cheaply, and then ask the remote side what those
will cost.

Ashutosh Bapat, with very substantial cosmetic revisions by me.  Also
reviewed by Rushabh Lathia.
2015-12-22 13:46:40 -05:00
Stephen Frost e5e11c8cca Collect the global OR of hasRowSecurity flags for plancache
We carry around information about if a given query has row security or
not to allow the plancache to use that information to invalidate a
planned query in the event that the environment changes.

Previously, the flag of one of the subqueries was simply being copied
into place to indicate if the query overall included RLS components.
That's wrong as we need the global OR of all subqueries.  Fix by
changing the code to match how fireRIRules works, which is results
in OR'ing all of the flags.

Noted by Tom.

Back-patch to 9.5 where RLS was introduced.
2015-12-14 20:05:43 -05:00
Tom Lane 4fcf48450d Get rid of the planner's LateralJoinInfo data structure.
I originally modeled this data structure on SpecialJoinInfo, but after
commit acfcd45cac that looks like a pretty poor decision.
All we really need is relid sets identifying laterally-referenced rels;
and most of the time, what we want to know about includes indirect lateral
references, a case the LateralJoinInfo data was unsuited to compute with
any efficiency.  The previous commit redefined RelOptInfo.lateral_relids
as the transitive closure of lateral references, so that it easily supports
checking indirect references.  For the places where we really do want just
direct references, add a new RelOptInfo field direct_lateral_relids, which
is easily set up as a copy of lateral_relids before we perform the
transitive closure calculation.  Then we can just drop lateral_info_list
and LateralJoinInfo and the supporting code.  This makes the planner's
handling of lateral references noticeably more efficient, and shorter too.

Such a change can't be back-patched into stable branches for fear of
breaking extensions that might be looking at the planner's data structures;
but it seems not too late to push it into 9.5, so I've done so.
2015-12-11 15:52:38 -05:00
Tom Lane acfcd45cac Still more fixes for planner's handling of LATERAL references.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references.  If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too.  The planner did not understand this and would generate
intermediate joins that could not be used.  While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.

To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent.  This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.

Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations.  This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.

Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1.  Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures.  The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen.  I chose to move the
supporting logic into joinrels.c where the latter functions are.

Back-patch to 9.3 where LATERAL support was introduced.
2015-12-11 14:22:20 -05:00
Robert Haas 385f337c9f Allow foreign and custom joins to handle EvalPlanQual rechecks.
Commit e7cb7ee145 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join.  To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.

To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method.  Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).

KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
2015-12-08 12:31:03 -05:00
Tom Lane edca44b152 Simplify LATERAL-related calculations within add_paths_to_joinrel().
While convincing myself that commit 7e19db0c09 would solve both of
the problems recently reported by Andreas Seltenreich, I realized that
add_paths_to_joinrel's handling of LATERAL restrictions could be made
noticeably simpler and faster if we were to retain the minimum possible
parameterization for each joinrel (that is, the set of relids supplying
unsatisfied lateral references in it).  We already retain that for
baserels, in RelOptInfo.lateral_relids, so we can use that field for
joinrels too.

I re-pgindent'd the files touched here, which affects some unrelated
comments.

This is, I believe, just a minor optimization not a bug fix, so no
back-patch.
2015-12-07 18:56:17 -05:00
Tom Lane 7e19db0c09 Fix another oversight in checking if a join with LATERAL refs is legal.
It was possible for the planner to decide to join a LATERAL subquery to
the outer side of an outer join before the outer join itself is completed.
Normally that's fine because of the associativity rules, but it doesn't
work if the subquery contains a lateral reference to the inner side of the
outer join.  In such a situation the outer join *must* be done first.
join_is_legal() missed this consideration and would allow the join to be
attempted, but the actual path-building code correctly decided that no
valid join path could be made, sometimes leading to planner errors such as
"failed to build any N-way joins".

Per report from Andreas Seltenreich.  Back-patch to 9.3 where LATERAL
support was added.
2015-12-07 17:42:11 -05:00
Robert Haas c7485a82c3 Add handling for GatherPath to print_path.
Peter Geoghegan
2015-12-02 08:19:50 -05:00
Robert Haas 7907a949ab Fix incomplete set_foreignscan_references handling for fdw_recheck_quals
KaiGai Kohei
2015-11-18 22:12:21 -05:00
Robert Haas 80558c1f5a Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.

Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.

Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.

Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.

Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 09:02:52 -05:00
Robert Haas f0661c4e8c Make sequential scans parallel-aware.
In addition, this path fills in a number of missing bits and pieces in
the parallel infrastructure.  Paths and plans now have a parallel_aware
flag indicating whether whatever parallel-aware logic they have should
be engaged.  It is believed that we will need this flag for a number of
path/plan types, not just sequential scans, which is why the flag is
generic rather than part of the SeqScan structures specifically.
Also, execParallel.c now gives parallel nodes a chance to initialize
their PlanState nodes from the DSM during parallel worker startup.

Amit Kapila, with a fair amount of adjustment by me.  Review of previous
patch versions by Haribabu Kommi and others.
2015-11-11 08:57:52 -05:00
Robert Haas 5c90a2ffdd Comment update.
Adjust to account for 5fc4c26db5.

Etsuro Fujita
2015-11-09 13:48:50 -05:00
Tom Lane 59464bd6f9 Improve implementation of GEQO's init_tour() function.
Rather than filling a temporary array and then copying values to the
output array, we can generate the required random permutation in-place
using the Fisher-Yates shuffle algorithm.  This is shorter as well as
more efficient than before.  It's pretty unlikely that anyone would
notice a speed improvement, but shorter code is better.

Nathan Wagner, edited a bit by me
2015-11-05 10:46:14 -05:00
Robert Haas 8538a63070 Make Gather node projection-capable.
The original Gather code failed to mark a Gather node as not able to
do projection, but it couldn't, even though it did call initialize its
projection info via ExecAssignProjectionInfo.  There doesn't seem to
be any good reason for this node not to have projection capability,
so clean things up so that it does.  Without this, plans using Gather
nodes might need to carry extra Result nodes to do projection.
2015-10-28 00:27:58 +01:00
Robert Haas a53c06a13e Prohibit parallel query when the isolation level is serializable.
In order for this to be safe, the code which hands true serializability
will need to taught that the SIRead locks taken by a parallel worker
pertain to the same transaction as those taken by the parallel leader.
Some further changes may be needed as well.  Until the necessary
adaptations are made, don't generate parallel plans in serializable
mode, and if a previously-generated parallel plan is used after
serializable mode has been activated, run it serially.

This fixes a bug in commit 7aea8e4f2d.
2015-10-16 11:58:27 -04:00
Robert Haas 5fc4c26db5 Allow FDWs to push down quals without breaking EvalPlanQual rechecks.
This fixes a long-standing bug which was discovered while investigating
the interaction between the new join pushdown code and the EvalPlanQual
machinery: if a ForeignScan appears on the inner side of a paramaterized
nestloop, an EPQ recheck would re-return the original tuple even if
it no longer satisfied the pushed-down quals due to changed parameter
values.

This fix adds a new member to ForeignScan and ForeignScanState and a
new argument to make_foreignscan, and requires changes to FDWs which
push down quals to populate that new argument with a list of quals they
have chosen to push down.  Therefore, I'm only back-patching to 9.5,
even though the bug is not new in 9.5.

Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
2015-10-15 13:00:40 -04:00
Stephen Frost b7aac36245 Handle append_rel_list in expand_security_qual
During expand_security_quals, we take the security barrier quals on an
RTE and create a subquery which evaluates the quals.  During this, we
have to replace any variables in the outer query which refer to the
original RTE with references to the columns from the subquery.

We need to also perform that replacement for any Vars in the
append_rel_list.

Only backpatching to 9.5 as we only go through this process in 9.4 for
auto-updatable security barrier views, which UNION ALL queries aren't.

Discovered by Haribabu Kommi

Patch by Dean Rasheed
2015-10-09 10:49:02 -04:00
Andres Freund ad22783792 Fix several bugs related to ON CONFLICT's EXCLUDED pseudo relation.
Four related issues:

1) attnos/varnos/resnos for EXCLUDED were out of sync when a column
   after one dropped in the underlying relation was referenced.
2) References to whole-row variables (i.e. EXCLUDED.*) lead to errors.
3) It was possible to reference system columns in the EXCLUDED pseudo
   relations, even though they would not have valid contents.
4) References to EXCLUDED were rewritten by the RLS machinery, as
   EXCLUDED was treated as if it were the underlying relation.

To fix the first two issues, generate the excluded targetlist with
dropped columns in mind and add an entry for whole row
variables. Instead of unconditionally adding a wholerow entry we could
pull up the expression if needed, but doing it unconditionally seems
simpler. The wholerow entry is only really needed for ruleutils/EXPLAIN
support anyway.

The remaining two issues are addressed by changing the EXCLUDED RTE to
have relkind = composite. That fits with EXCLUDED not actually being a
real relation, and allows to treat it differently in the relevant
places. scanRTEForColumn now skips looking up system columns when the
RTE has a composite relkind; fireRIRrules() already had a corresponding
check, thereby preventing RLS expansion on EXCLUDED.

Also add tests for these issues, and improve a few comments around
excluded handling in setrefs.c.

Reported-By: Peter Geoghegan, Geoff Winkless
Author: Andres Freund, Amit Langote, Peter Geoghegan
Discussion: CAEzk6fdzJ3xYQZGbcuYM2rBd2BuDkUksmK=mY9UYYDugg_GgZg@mail.gmail.com,
   CAM3SWZS+CauzbiCEcg-GdE6K6ycHE_Bz6Ksszy8AoixcMHOmsA@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-10-03 15:12:10 +02:00
Tom Lane 21995d3f6d Fix documentation error in commit 8703059c6b.
Etsuro Fujita spotted a thinko in the README commentary.
2015-10-01 10:32:11 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Robert Haas 758fcfdc01 Comment update for join pushdown.
Etsuro Fujita
2015-09-29 07:42:30 -04:00
Robert Haas d1b7c1ffe7 Parallel executor support.
This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master.  User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not.  Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.

Amit Kapila and Robert Haas
2015-09-28 21:55:57 -04:00
Tom Lane 39df0f150c Allow planner to use expression-index stats for function calls in WHERE.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992.  The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future.  We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity.  But the code in clause_selectivity()
failed to make use of such data even if it exists.

Refactor so that that will happen.  I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions.  To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.

I preserved the behavior that the default estimate for a function call
is 0.3333333.  (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.

Per a complaint from Jehan-Guillaume de Rorthais.  Back-patch into 9.5,
but not further, at least for the moment.
2015-09-24 18:35:46 -04:00
Robert Haas 7aea8e4f2d Determine whether it's safe to attempt a parallel plan for a query.
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode.  In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode.  This requires those functions to be
labeled, so this patch introduces an infrastructure for that.  Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.

Parallel plans can't be used except for the case where the query will
run to completion.  If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail.  Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion.  This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count.  Having the client indicate its
intentions at bind time would constitute a wire protocol break.  Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.

This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.

Robert Haas and Amit Kapila.  Reviewed (in earlier versions) by Noah
Misch.
2015-09-16 15:38:47 -04:00
Tom Lane 87efbc2be1 Fix setrefs.c comment properly.
The "typo" alleged in commit 1e460d4bd was actually a comment that was
correct when written, but I missed updating it in commit b5282aa89.
Use a slightly less specific (and hopefully more future-proof) description
of what is collected.  Back-patch to 9.2 where that commit appeared, and
revert the comment to its then-entirely-correct state before that.
2015-09-10 10:23:56 -04:00
Stephen Frost 1e460d4bd6 Fix typo in setrefs.c
We're adding OIDs, not TIDs, to invalItems.

Pointed out by Etsuro Fujita.

Back-patch to all supported branches.
2015-09-10 09:22:03 -04:00
Heikki Linnakangas c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Tom Lane cfe30a72fa Undo mistaken tightening in join_is_legal().
One of the changes I made in commit 8703059c6b turns out not to have
been such a good idea: we still need the exception in join_is_legal() that
allows a join if both inputs already overlap the RHS of the special join
we're checking.  Otherwise we can miss valid plans, and might indeed fail
to find a plan at all, as in recent report from Andreas Seltenreich.

That code was added way back in commit c17117649b, but I failed to
include a regression test case then; my bad.  Put it back with a better
explanation, and a test this time.  The logic does end up a bit different
than before though: I now believe it's appropriate to make this check
first, thereby allowing such a case whether or not we'd consider the
previous SJ(s) to commute with this one.  (Presumably, we already decided
they did; but it was confusing to have this consideration in the middle
of the code that was handling the other case.)

Back-patch to all active branches, like the previous patch.
2015-08-12 21:19:03 -04:00
Tom Lane 68fa28f771 Postpone extParam/allParam calculations until the very end of planning.
Until now we computed these Param ID sets at the end of subquery_planner,
but that approach depends on subquery_planner returning a concrete Plan
tree.  We would like to switch over to returning one or more Paths for a
subquery, and in that representation the necessary details aren't fully
fleshed out (not to mention that we don't really want to do this work for
Paths that end up getting discarded).  Hence, refactor so that we can
compute the param ID sets at the end of planning, just before
set_plan_references is run.

The main change necessary to make this work is that we need to capture
the set of outer-level Param IDs available to the current query level
before exiting subquery_planner, since the outer levels' plan_params lists
are transient.  (That's not going to pose a problem for returning Paths,
since all the work involved in producing that data is part of expression
preprocessing, which will continue to happen before Paths are produced.)
On the plus side, this change gets rid of several existing kluges.

Eventually I'd like to get rid of SS_finalize_plan altogether in favor of
doing this work during set_plan_references, but that will require some
complex rejiggering because SS_finalize_plan needs to visit subplans and
initplans before the main plan.  So leave that idea for another day.
2015-08-11 23:48:37 -04:00
Tom Lane 4200a92862 Further mucking with PlaceHolderVar-related restrictions on join order.
Commit 85e5e222b1 turns out not to have taken
care of all cases of the partially-evaluatable-PlaceHolderVar problem found
by Andreas Seltenreich's fuzz testing.  I had set it up to check for risky
PHVs only in the event that we were making a star-schema-based exception to
the param_source_rels join ordering heuristic.  However, it turns out that
the problem can occur even in joins that satisfy the param_source_rels
heuristic, in which case allow_star_schema_join() isn't consulted.
Refactor so that we check for risky PHVs whenever the proposed join has
any remaining parameterization.

Back-patch to 9.2, like the previous patch (except for the regression test
case, which only works back to 9.3 because it uses LATERAL).

Note that this discovery implies that problems of this sort could've
occurred in 9.2 and up even before the star-schema patch; though I've not
tried to prove that experimentally.
2015-08-10 17:18:17 -04:00
Tom Lane 89db83922a Further adjustments to PlaceHolderVar removal.
A new test case from Andreas Seltenreich showed that we were still a bit
confused about removing PlaceHolderVars during join removal.  Specifically,
remove_rel_from_query would remove a PHV that was used only underneath
the removable join, even if the place where it's used was the join partner
relation and not the join clause being deleted.  This would lead to a
"too late to create a new PlaceHolderInfo" error later on.  We can defend
against that by checking ph_eval_at to see if the PHV could possibly be
getting used at some partner rel.

Also improve some nearby LATERAL-related logic.  I decided that the check
on ph_lateral needed to take precedence over the check on ph_needed, in
case there's a lateral reference underneath the join being considered.
(That may be impossible, but I'm not convinced of it, and it's easy enough
to defend against the case.)  Also, I realized that remove_rel_from_query's
logic for updating LateralJoinInfos is dead code, because we don't build
those at all until after join removal.

Back-patch to 9.3.  Previous versions didn't have the LATERAL issues, of
course, and they also didn't attempt to remove PlaceHolderInfos during join
removal.  (I'm starting to wonder if changing that was really such a great
idea.)
2015-08-07 14:13:50 -04:00
Tom Lane bab163e121 Fix old oversight in join removal logic.
Commit 9e7e29c75a introduced an Assert that
join removal didn't reduce the eval_at set of any PlaceHolderVar to empty.
At first glance it looks like join_is_removable ensures that's true --- but
actually, the loop in join_is_removable skips PlaceHolderVars that are not
referenced above the join due to be removed.  So, if we don't want any
empty eval_at sets, the right thing to do is to delete any now-unreferenced
PlaceHolderVars from the data structure entirely.

Per fuzz testing by Andreas Seltenreich.  Back-patch to 9.3 where the
aforesaid Assert was added.
2015-08-06 22:14:27 -04:00
Tom Lane cde35cf4ae Fix eclass_useful_for_merging to give valid results for appendrel children.
Formerly, this function would always return "true" for an appendrel child
relation, because it would think that the appendrel parent was a potential
join target for the child.  In principle that should only lead to some
inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed
that it could lead to "could not find pathkey item to sort" planner errors
in odd corner cases.  Specifically, we would think that all columns of a
child table's multicolumn index were interesting pathkeys, causing us to
generate a MergeAppend path that sorts by all the columns.  However, if any
of those columns weren't actually used above the level of the appendrel,
they would not get added to that rel's targetlist, which would result in
being unable to resolve the MergeAppend's sort keys against its targetlist
during createplan.c.

Backpatch to 9.3.  In older versions, columns of an appendrel get added
to its targetlist even if they're not mentioned above the scan level,
so that the failure doesn't occur.  It might be worth back-patching this
fix to older versions anyway, but I'll refrain for the moment.
2015-08-06 20:14:53 -04:00
Tom Lane 8703059c6b Further fixes for degenerate outer join clauses.
Further testing revealed that commit f69b4b9495 was still a few
bricks shy of a load: minor tweaking of the previous test cases resulted
in the same wrong-outer-join-order problem coming back.  After study
I concluded that my previous changes in make_outerjoininfo() were just
accidentally masking the problem, and should be reverted in favor of
forcing syntactic join order whenever an upper outer join's predicate
doesn't mention a lower outer join's LHS.  This still allows the
chained-outer-joins style that is the normally optimizable case.

I also tightened things up some more in join_is_legal().  It seems to me
on review that what's really happening in the exception case where we
ignore a mismatched special join is that we're allowing the proposed join
to associate into the RHS of the outer join we're comparing it to.  As
such, we should *always* insist that the proposed join be a left join,
which eliminates a bunch of rather dubious argumentation.  The case where
we weren't enforcing that was the one that was already known buggy anyway
(it had a violatable Assert before the aforesaid commit) so it hardly
deserves a lot of deference.

Back-patch to all active branches, like the previous patch.  The added
regression test case failed in all branches back to 9.1, and I think it's
only an unrelated change in costing calculations that kept 9.0 from
choosing a broken plan.
2015-08-06 15:35:46 -04:00
Tom Lane 6af9ee4c8c Make real sure we don't reassociate joins into or out of SEMI/ANTI joins.
Per the discussion in optimizer/README, it's unsafe to reassociate anything
into or out of the RHS of a SEMI or ANTI join.  An example from Piotr
Stefaniak showed that join_is_legal() wasn't sufficiently enforcing this
rule, so lock it down a little harder.

I couldn't find a reasonably simple example of the optimizer trying to
do this, so no new regression test.  (Piotr's example involved the random
search in GEQO accidentally trying an invalid case and triggering a sanity
check way downstream in clause selectivity estimation, which did not seem
like a sequence of events that would be useful to memorialize in a
regression test as-is.)

Back-patch to all active branches.
2015-08-05 14:39:29 -04:00
Tom Lane 85e5e222b1 Fix a PlaceHolderVar-related oversight in star-schema planning patch.
In commit b514a7460d, I changed the planner
so that it would allow nestloop paths to remain partially parameterized,
ie the inner relation might need parameters from both the current outer
relation and some upper-level outer relation.  That's fine so long as we're
talking about distinct parameters; but the patch also allowed creation of
nestloop paths for cases where the inner relation's parameter was a
PlaceHolderVar whose eval_at set included the current outer relation and
some upper-level one.  That does *not* work.

In principle we could allow such a PlaceHolderVar to be evaluated at the
lower join node using values passed down from the upper relation along with
values from the join's own outer relation.  However, nodeNestloop.c only
supports simple Vars not arbitrary expressions as nestloop parameters.
createplan.c is also a few bricks shy of being able to handle such cases;
it misplaces the PlaceHolderVar parameters in the plan tree, which is why
the visible symptoms of this bug are "plan should not reference subplan's
variable" and "failed to assign all NestLoopParams to plan nodes" planner
errors.

Adding the necessary complexity to make this work doesn't seem like it
would be repaid in significantly better plans, because in cases where such
a PHV exists, there is probably a corresponding join order constraint that
would allow a good plan to be found without using the star-schema exception.
Furthermore, adding complexity to nodeNestloop.c would create a run-time
penalty even for plans where this whole consideration is irrelevant.
So let's just reject such paths instead.

Per fuzz testing by Andreas Seltenreich; the added regression test is based
on his example query.  Back-patch to 9.2, like the previous patch.
2015-08-04 14:55:50 -04:00
Tom Lane f69b4b9495 Fix some planner issues with degenerate outer join clauses.
An outer join clause that didn't actually reference the RHS (perhaps only
after constant-folding) could confuse the join order enforcement logic,
leading to wrong query results.  Also, nested occurrences of such things
could trigger an Assertion that on reflection seems incorrect.

Per fuzz testing by Andreas Seltenreich.  The practical use of such cases
seems thin enough that it's not too surprising we've not heard field
reports about it.

This has been broken for a long time, so back-patch to all active branches.
2015-08-01 20:57:41 -04:00
Tom Lane dea1491ffb Teach predtest.c that "foo" implies "foo IS NOT NULL".
Per complaint from Peter Holzer.  It's useful to cover this special case,
since for a boolean variable "foo", earlier parts of the planner will have
reduced variants like "foo = true" to just "foo", and thus we may fail
to recognize the applicability of a partial index with predicate
"foo IS NOT NULL".

Back-patch to 9.5, but not further; given the lack of previous complaints
this doesn't seem like behavior to change in stable branches.
2015-08-01 14:31:46 -04:00
Tom Lane a6492ff897 Fix an oversight in checking whether a join with LATERAL refs is legal.
In many cases, we can implement a semijoin as a plain innerjoin by first
passing the righthand-side relation through a unique-ification step.
However, one of the cases where this does NOT work is where the RHS has
a LATERAL reference to the LHS; that makes the RHS dependent on the LHS
so that unique-ification is meaningless.  joinpath.c understood this,
and so would not generate any join paths of this kind ... but join_is_legal
neglected to check for the case, so it would think that we could do it.
The upshot would be a "could not devise a query plan for the given query"
failure once we had failed to generate any join paths at all for the bogus
join pair.

Back-patch to 9.3 where LATERAL was added.
2015-07-31 19:26:33 -04:00
Tom Lane 8693ebe37d Avoid some zero-divide hazards in the planner.
Although I think on all modern machines floating division by zero
results in Infinity not SIGFPE, we still don't want infinities
running around in the planner's costing estimates; too much risk
of that leading to insane behavior.

grouping_planner() failed to consider the possibility that final_rel
might be known dummy and hence have zero rowcount.  (I wonder if it
would be better to set a rows estimate of 1 for dummy relations?
But at least in the back branches, changing this convention seems
like a bad idea, so I'll leave that for another day.)

Make certain that get_variable_numdistinct() produces a nonzero result.
The case that can be shown to be broken is with stadistinct < 0.0 and
small ntuples; we did not prevent the result from rounding to zero.
For good luck I applied clamp_row_est() to all the nonconstant return
values.

In ExecChooseHashTableSize(), Assert that we compute positive nbuckets
and nbatch.  I know of no reason to think this isn't the case, but it
seems like a good safety check.

Per reports from Piotr Stefaniak.  Back-patch to all active branches.
2015-07-30 12:11:23 -04:00
Robert Haas f04ce31475 Fix incorrect comment.
Amit Langote
2015-07-29 16:47:12 -04:00
Tom Lane 95f4e59c32 Remove an unsafe Assert, and explain join_clause_is_movable_into() better.
join_clause_is_movable_into() is approximate, in the sense that it might
sometimes return "false" when actually it would be valid to push the given
join clause down to the specified level.  This is okay ... but there was
an Assert in get_joinrel_parampathinfo() that's only safe if the answers
are always exact.  Comment out the Assert, and add a bunch of commentary
to clarify what's going on.

Per fuzz testing by Andreas Seltenreich.  The added regression test is
a pretty silly query, but it's based on his crasher example.

Back-patch to 9.2 where the faulty logic was introduced.
2015-07-28 13:20:39 -04:00
Tom Lane fca8e59c1c Fix oversight in flattening of subqueries with empty FROM.
I missed a restriction that commit f4abd0241d
should have enforced: we can't pull up an empty-FROM subquery if it's under
an outer join, because then we'd need to wrap its output columns in
PlaceHolderVars.  As the code currently stands, the PHVs end up with empty
relid sets, which doesn't work (and is correctly caught by an Assert).

It's possible that this could be fixed by assigning the PHVs the relid
sets of the parent FromExpr/JoinExpr, but getting that to work is more
complication than I care to add right now; indeed it's likely that
we'll never bother, since pulling up empty-FROM subqueries is a rather
marginal optimization anyway.

Per report from Andreas Seltenreich.  Back-patch to 9.5 where the faulty
code was added.
2015-07-26 17:44:27 -04:00
Tom Lane 358eaa01bf Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL).  When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount.  Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist.  (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)

In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.

Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.

Per report from Andreas Seltenreich.  Back-patch to 9.2.  Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist.  It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 16:19:08 -04:00
Andres Freund 159cff58cf Check the relevant index element in ON CONFLICT unique index inference.
ON CONFLICT unique index inference had a thinko that could affect cases
where the user-supplied inference clause required that an attribute
match a particular (user specified) collation and/or opclass.

infer_collation_opclass_match() has to check for opclass and/or
collation matches and that the attribute is in the list of attributes or
expressions known to be in the definition of the index under
consideration. The bug was that these two conditions weren't necessarily
evaluated for the same index attribute.

Author: Peter Geoghegan
Discussion: CAM3SWZR4uug=WvmGk7UgsqHn2MkEzy9YU-+8jKGO4JPhesyeWg@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-07-26 18:20:41 +02:00
Andres Freund 61444bfb80 Allow to push down clauses from HAVING to WHERE when grouping sets are used.
Previously we disallowed pushing down quals to WHERE in the presence of
grouping sets. That's overly restrictive.

We now instead copy quals to WHERE if applicable, leaving the
one in HAVING in place. That's because, at that stage of the planning
process, it's nontrivial to determine if it's safe to remove the one in
HAVING.

Author: Andrew Gierth
Discussion: 874mkt3l59.fsf@news-spur.riddles.org.uk
Backpatch: 9.5, where grouping sets were introduced. This isn't exactly
    a bugfix, but it seems better to keep the branches in sync at this point.
2015-07-26 16:50:20 +02:00
Andres Freund e6d8cb77c0 Recognize GROUPING() as a aggregate expression.
Previously GROUPING() was not recognized as a aggregate expression,
erroneously allowing the planner to move it from HAVING to WHERE.

Author: Jeevan Chalke
Reviewed-By: Andrew Gierth
Discussion: CAM2+6=WG9omG5rFOMAYBweJxmpTaapvVp5pCeMrE6BfpCwr4Og@mail.gmail.com
Backpatch: 9.5, where grouping sets were introduced
2015-07-26 16:50:02 +02:00
Andres Freund 144666f65b Build column mapping for grouping sets in all required cases.
The previous coding frequently failed to fail because for one it's
unusual to have rollup clauses with one column, and for another
sometimes the wrong mapping didn't cause obvious problems.

Author: Jeevan Chalke
Reviewed-By: Andrew Gierth
Discussion: CAM2+6=W=9=hQOipH0HAPbkun3Z3TFWij_EiHue0_6UX=oR=1kw@mail.gmail.com
Backpatch: 9.5, where grouping sets were introduced
2015-07-26 16:46:27 +02:00
Tom Lane dd7a8f66ed Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM.  (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.)  Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type.  This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.

Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.

Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.

Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
Joe Conway b26e3d660d Make RLS work with UPDATE ... WHERE CURRENT OF
UPDATE ... WHERE CURRENT OF would not work in conjunction with
RLS. Arrange to allow the CURRENT OF expression to be pushed down.
Issue noted by Peter Geoghegan. Patch by Dean Rasheed. Back patch
to 9.5 where RLS was introduced.
2015-07-24 12:55:30 -07:00
Tom Lane 46d0a9bfac Fix add_rte_to_flat_rtable() for recent feature additions.
The TABLESAMPLE and row security patches each overlooked this function,
though their errors of omission were opposite: RLS failed to zero out the
securityQuals field, leading to wasteful copying of useless expression
trees in finished plans, while TABLESAMPLE neglected to add a comment
saying that it intentionally *isn't* deleting the tablesample subtree.
There probably should be a similar comment about ctename, too.

Back-patch as appropriate.
2015-07-21 20:03:58 -04:00
Magnus Hagander 828df727a6 Fix spelling error
David Rowley
2015-07-16 10:31:58 +03:00
Andres Freund 3ed26e5f87 For consistency add a pfree to ON CONFLICT set_plan_refs code.
Backpatch to 9.5 where ON CONFLICT was introduced.

Author: Peter Geoghegan
2015-07-12 22:18:57 +02:00
Heikki Linnakangas 7845db2aa7 Fix typo in comment
Etsuro Fujita
2015-06-27 10:17:42 +03:00
Robert Haas 5ca611841b Improve handling of CustomPath/CustomPlan(State) children.
Allow CustomPath to have a list of paths, CustomPlan a list of plans,
and CustomPlanState a list of planstates known to the core system, so
that custom path/plan providers can more reasonably use this
infrastructure for nodes with multiple children.

KaiGai Kohei, per a design suggestion from Tom Lane, with some
further kibitzing by me.
2015-06-26 09:40:47 -04:00
Robert Haas 51d0fe5d56 Update get_relation_info comment.
Thomas Munro
2015-06-23 10:09:53 -04:00
Tom Lane 2cb9ec1bcb Improve inheritance_planner()'s performance for large inheritance sets.
Commit c03ad5602f introduced a planner
performance regression for UPDATE/DELETE on large inheritance sets.
It required copying the append_rel_list (which is of size proportional to
the number of inherited tables) once for each inherited table, thus
resulting in O(N^2) time and memory consumption.  While it's difficult to
avoid that in general, the extra work only has to be done for
append_rel_list entries that actually reference subquery RTEs, which
inheritance-set entries will not.  So we can buy back essentially all of
the loss in cases without subqueries in FROM; and even for those, the added
work is mainly proportional to the number of UNION ALL subqueries.

Back-patch to 9.2, like the previous commit.

Tom Lane and Dean Rasheed, per a complaint from Thomas Munro.
2015-06-22 18:53:27 -04:00
Tom Lane 3b0f77601b Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons.  (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)

Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so.  The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins.  It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive.  Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.

Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths.  In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on.  That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized  paths match.

These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans.  (There is in fact one minor change in plan choice
within the core regression tests.)  Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-03 18:02:39 -04:00
Tom Lane 3f59be836c Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick.  The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case.  If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is.  To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.

Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths.  That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic.  In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem.  To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.

To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans.  There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.

Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.

Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 11:59:10 -04:00
Tom Lane 2aa0476dc3 Manual cleanup of pgindent results.
Fix some places where pgindent did silly stuff, often because project
style wasn't followed to begin with.  (I've not touched the atomics
headers, though.)
2015-05-24 15:04:10 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Andres Freund 631d749007 Remove the new UPSERT command tag and use INSERT instead.
Previously, INSERT with ON CONFLICT DO UPDATE specified used a new
command tag -- UPSERT.  It was introduced out of concern that INSERT as
a command tag would be a misrepresentation for ON CONFLICT DO UPDATE, as
some affected rows may actually have been updated.

Alvaro Herrera noticed that the implementation of that new command tag
was incomplete; in subsequent discussion we concluded that having it
doesn't provide benefits that are in line with the compatibility breaks
it requires.

Catversion bump due to the removal of PlannedStmt->isUpsert.

Author: Peter Geoghegan
Discussion: 20150520215816.GI5885@postgresql.org
2015-05-23 00:58:45 +02:00
Tom Lane c5dd8ead40 More fixes for lossy-GiST-distance-functions patch.
Paul Ramsey reported that commit 35fcb1b3d0
induced a core dump on commuted ORDER BY expressions, because it was
assuming that the indexorderby expression could be found verbatim in the
relevant equivalence class, but it wasn't there.  We really don't need
anything that complicated anyway; for the data types likely to be used for
index ORDER BY operators in the foreseeable future, the exprType() of the
ORDER BY expression will serve fine.  (The case where we'd have to work
harder is where the ORDER BY expression's result is only binary-compatible
with the declared input type of the ordering operator; long before worrying
about that, one would need to get rid of GiST's hard-wired assumption that
said datatype is float8.)

Aside from fixing that crash and adding a regression test for the case,
I did some desultory code review:

nodeIndexscan.c was likewise overthinking how hard it ought to work to
identify the datatype of the ORDER BY expressions.

Add comments explaining how come nodeIndexscan.c can get away with
simplifying assumptions about NULLS LAST ordering and no backward scan.

Revert no-longer-needed changes of find_ec_member_for_tle(); while the
new definition was no worse than the old, it wasn't better either, and
it might cause back-patching pain.

Revert entirely bogus additions to genam.h.
2015-05-21 19:47:48 -04:00
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Andres Freund 0740cbd759 Refactor ON CONFLICT index inference parse tree representation.
Defer lookup of opfamily and input type of a of a user specified opclass
until the optimizer selects among available unique indexes; and store
the opclass in the parse analyzed tree instead.  The primary reason for
doing this is that for rule deparsing it's easier to use the opclass
than the previous representation.

While at it also rename a variable in the inference code to better fit
it's purpose.

This is separate from the actual fixes for deparsing to make review
easier.
2015-05-19 21:21:27 +02:00
Tom Lane 424661913c Fix failure to copy IndexScan.indexorderbyops in copyfuncs.c.
This oversight results in a crash at executor startup if the plan has
been copied.  outfuncs.c was missed as well.

While we could probably have taught both those files to cope with the
originally chosen representation of an Oid array, it would have been
painful, not least because there'd be no easy way to verify the array
length.  An Oid List is far easier to work with.  And AFAICS, there is
no particular notational benefit to using an array rather than a list
in the existing parts of the patch either.  So just change it to a list.

Error in commit 35fcb1b3d0, which is new,
so no need for back-patch.
2015-05-17 21:22:12 -04:00
Andres Freund f3d3118532 Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:46:31 +02:00
Alvaro Herrera 26df7066cc Move strategy numbers to include/access/stratnum.h
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)
2015-05-15 17:03:16 -03:00
Simon Riggs f6d208d6e5 TABLESAMPLE, SQL Standard and extensible
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs
2015-05-15 14:37:10 -04:00
Heikki Linnakangas 35fcb1b3d0 Allow GiST distance function to return merely a lower-bound.
The distance function can now set *recheck = false, like index quals. The
executor will then re-check the ORDER BY expressions, and use a queue to
reorder the results on the fly.

This makes it possible to do kNN-searches on polygons and circles, which
don't store the exact value in the index, but just a bounding box.

Alexander Korotkov and me
2015-05-15 14:26:51 +03:00
Andres Freund 4af6e61a36 Fix ON CONFLICT bugs that manifest when used in rules.
Specifically the tlist and rti of the pseudo "excluded" relation weren't
properly treated by expression_tree_walker, which lead to errors when
excluded was referenced inside a rule because the varnos where not
properly adjusted.  Similar omissions in OffsetVarNodes and
expression_tree_mutator had less impact, but should obviously be fixed
nonetheless.

A couple tests of for ON CONFLICT UPDATE into INSERT rule bearing
relations have been added.

In passing I updated a couple comments.
2015-05-13 00:13:22 +02:00
Tom Lane afb9249d06 Add support for doing late row locking in FDWs.
Previously, FDWs could only do "early row locking", that is lock a row as
soon as it's fetched, even though local restriction/join conditions might
discard the row later.  This patch adds callbacks that allow FDWs to do
late locking in the same way that it's done for regular tables.

To make use of this feature, an FDW must support the "ctid" column as a
unique row identifier.  Currently, since ctid has to be of type TID,
the feature is of limited use, though in principle it could be used by
postgres_fdw.  We may eventually allow FDWs to specify another data type
for ctid, which would make it possible for more FDWs to use this feature.

This commit does not modify postgres_fdw to use late locking.  We've
tested some prototype code for that, but it's not in committable shape,
and besides it's quite unclear whether it actually makes sense to do late
locking against a remote server.  The extra round trips required are likely
to outweigh any benefit from improved concurrency.

Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
2015-05-12 14:10:17 -04:00
Tom Lane 1a8a4e5cde Code review for foreign/custom join pushdown patch.
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments.  Clean up
as follows:

* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function.  In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.

* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.

* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that.  Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.

* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries.  The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks.  It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway.  I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.

* Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.

* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.

* Lots of cleanup of documentation and missed comments.  Re-order some
code additions into more logical places.
2015-05-10 14:36:36 -04:00
Andres Freund bab64ef9e8 Fix two problems in infer_arbiter_indexes().
The first is a pretty simple bug where a relcache entry is used after
the relation is closed. In this particular situation it does not appear
to have bad consequences unless compiled with RELCACHE_FORCE_RELEASE.

The second is that infer_arbiter_indexes() skipped indexes that aren't
yet valid according to indcheckxmin. That's not required here, because
uniqueness checks don't care about visibility according to an older
snapshot.  While thats not really a bug, it makes things undesirably
non-deterministic.  There is some hope that this explains a test failure
on buildfarm member jaguarundi.

Discussion: 9096.1431102730@sss.pgh.pa.us
2015-05-08 22:28:23 +02:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Andres Freund 2c8f4836db Represent columns requiring insert and update privileges indentently.
Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held.  As statements so far required
either insert or update privileges but never both, that was
sufficient. The required permission could be inferred from the top level
statement run.

The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
independently check for both privileges in one statement though, so that
is not sufficient anymore.

Bumps catversion as stored rules change.

Author: Peter Geoghegan
Reviewed-By: Andres Freund
2015-05-08 00:20:46 +02:00
Robert Haas e7cb7ee145 Allow FDWs and custom scan providers to replace joins with scans.
Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.

Custom scan providers can use this in a similar way.  Previously,
it was only possible for a custom scan provider to scan a single
relation.  Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.

KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
2015-05-01 08:50:35 -04:00
Stephen Frost dcbf5948e1 Improve qual pushdown for RLS and SB views
The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view.  This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.

As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks.  That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.

Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.
2015-04-27 12:29:42 -04:00
Andres Freund 2e3ca04e2e Also correct therefor to therefore.
Since both forms are arguably legal I wasn't sure about changing
this. But then Tom argued for 'therefore'...

Author: Dmitriy Olshevskiy
Discussion: 34789.1430067832@sss.pgh.pa.us
2015-04-26 19:05:39 +02:00
Tom Lane 3cf8686014 Prevent improper reordering of antijoins vs. outer joins.
An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that.  In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks.  It's broken as far back as we had antijoins, so back-patch to all
supported branches.
2015-04-25 16:44:27 -04:00
Tom Lane 70d44dd9de Fix obsolete comment in set_rel_size().
The cross-reference to set_append_rel_pathlist() was obsoleted by
commit e2fa76d80b, which split what
had been set_rel_pathlist() and child routines into two sets of
functions.  But I (tgl) evidently missed updating this comment.

Back-patch to 9.2 to avoid unnecessary divergence among branches.

Amit Langote
2015-04-24 15:18:07 -04:00
Stephen Frost 0bf22e0c8b RLS fixes, new hooks, and new test module
In prepend_row_security_policies(), defaultDeny was always true, so if
there were any hook policies, the RLS policies on the table would just
get discarded.  Fixed to start off with defaultDeny as false and then
properly set later if we detect that only the default deny policy exists
for the internal policies.

The infinite recursion detection in fireRIRrules() didn't properly
manage the activeRIRs list in the case of WCOs, so it would incorrectly
report infinite recusion if the same relation with RLS appeared more
than once in the rtable, for example "UPDATE t ... FROM t ...".

Further, the RLS expansion code in fireRIRrules() was handling RLS in
the main loop through the rtable, which lead to RTEs being visited twice
if they contained sublink subqueries, which
prepend_row_security_policies() attempted to handle by exiting early if
the RTE already had securityQuals.  That doesn't work, however, since
if the query involved a security barrier view on top of a table with
RLS, the RTE would already have securityQuals (from the view) by the
time fireRIRrules() was invoked, and so the table's RLS policies would
be ignored.  This is fixed in fireRIRrules() by handling RLS in a
separate loop at the end, after dealing with any other sublink
subqueries, thus ensuring that each RTE is only visited once for RLS
expansion.

The inheritance planner code didn't correctly handle non-target
relations with RLS, which would get turned into subqueries during
planning. Thus an update of the form "UPDATE t1 ... FROM t2 ..." where
t1 has inheritance and t2 has RLS quals would fail.  Fix by making sure
to copy in and update the securityQuals when they exist for non-target
relations.

process_policies() was adding WCOs to non-target relations, which is
unnecessary, and could lead to a lot of wasted time in the rewriter and
the planner. Fix by only adding WCO policies when working on the result
relation.  Also in process_policies, we should be copying the USING
policies to the WITH CHECK policies on a per-policy basis, fix by moving
the copying up into the per-policy loop.

Lastly, as noted by Dean, we were simply adding policies returned by the
hook provided to the list of quals being AND'd, meaning that they would
actually restrict records returned and there was no option to have
internal policies and hook-based policies work together permissively (as
all internal policies currently work).  Instead, explicitly add support
for both permissive and restrictive policies by having a hook for each
and combining the results appropriately.  To ensure this is all done
correctly, add a new test module (test_rls_hooks) to test the various
combinations of internal, permissive, and restrictive hook policies.

Largely from Dean Rasheed (thanks!):

CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com

Author: Dean Rasheed, though I added the new hooks and test module.
2015-04-22 12:01:06 -04:00
Stephen Frost 4ccc5bd28e Pull in tableoid for inheiritance with rowMarks
As noted by Etsuro Fujita [1] and Dean Rasheed[2],
cb1ca4d800 changed ExecBuildAuxRowMark()
to always look for the tableoid in the target list, but didn't also
change preprocess_targetlist() to always include the tableoid.  This
resulted in errors with soon-to-be-added RLS with inheritance tests,
and errors when using inheritance with foreign tables.

Authors: Etsuro Fujita and Dean Rasheed (independently)

Minor word-smithing on the comments by me.

[1] 552CF0B6.8010006@lab.ntt.co.jp
[2] CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com
2015-04-22 11:29:35 -04:00
Tom Lane ca6805338f Fix incorrect matching of subexpressions in outer-join plan nodes.
Previously we would re-use input subexpressions in all expression trees
attached to a Join plan node.  However, if it's an outer join and the
subexpression appears in the nullable-side input, this is potentially
incorrect for apparently-matching subexpressions that came from above
the outer join (ie, targetlist and qpqual expressions), because the
executor will treat the subexpression value as NULL when maybe it should
not be.

The case is fairly hard to hit because (a) you need a non-strict
subexpression (else NULL is correct), and (b) we don't usually compute
expressions in the outputs of non-toplevel plan nodes.  But we might do
so if the expressions are sort keys for a mergejoin, for example.

Probably in the long run we should make a more explicit distinction between
Vars appearing above and below an outer join, but that will be a major
planner redesign and not at all back-patchable.  For the moment, just hack
set_join_references so that it will not match any non-Var expressions
coming from nullable inputs to expressions that came from above the join.
(This is somewhat overkill, in that a strict expression could still be
matched, but it doesn't seem worth the effort to check that.)

Per report from Qingqing Zhou.  The added regression test case is based
on his example.

This has been broken for a very long time, so back-patch to all active
branches.
2015-04-04 19:55:15 -04:00
Heikki Linnakangas d04c8ed904 Add support for index-only scans in GiST.
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.

This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).

Anastasia Lubennikova, with additional fixes and modifications by me.
2015-03-26 19:12:00 +02:00
Tom Lane cb1ca4d800 Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.

As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS.  Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior.  Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it.  And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.

An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:

 Update on pt1  (cost=0.00..321.05 rows=3541 width=46)
   Update on pt1
   Foreign Update on ft1
   Foreign Update on ft2
   Update on child3
   ->  Seq Scan on pt1  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on ft1  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on ft2  (cost=100.00..148.03 rows=1170 width=46)
   ->  Seq Scan on child3  (cost=0.00..25.00 rows=1200 width=46)

This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.

Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
2015-03-22 13:53:21 -04:00
Tom Lane 7b8b8a4331 Improve representation of PlanRowMark.
This patch fixes two inadequacies of the PlanRowMark representation.

First, that the original LockingClauseStrength isn't stored (and cannot be
inferred for foreign tables, which always get ROW_MARK_COPY).  Since some
PlanRowMarks are created out of whole cloth and don't actually have an
ancestral RowMarkClause, this requires adding a dummy LCS_NONE value to
enum LockingClauseStrength, which is fairly annoying but the alternatives
seem worse.  This fix allows getting rid of the use of get_parse_rowmark()
in FDWs (as per the discussion around commits 462bd95705 and
8ec8760fc8), and it simplifies some things elsewhere.

Second, that the representation assumed that all child tables in an
inheritance hierarchy would use the same RowMarkType.  That's true today
but will soon not be true.  We add an "allMarkTypes" field that identifies
the union of mark types used in all a parent table's children, and use
that where appropriate (currently, only in preprocess_targetlist()).

In passing fix a couple of minor infelicities left over from the SKIP
LOCKED patch, notably that _outPlanRowMark still thought waitPolicy
is a bool.

Catversion bump is required because the numeric values of enum
LockingClauseStrength can appear in on-disk rules.

Extracted from a much larger patch to support foreign table inheritance;
it seemed worth breaking this out, since it's a separable concern.

Shigeru Hanada and Etsuro Fujita, somewhat modified by me
2015-03-15 18:41:47 -04:00
Tom Lane f4abd0241d Support flattening of empty-FROM subqueries and one-row VALUES tables.
We can't handle this in the general case due to limitations of the
planner's data representations; but we can allow it in many useful cases,
by being careful to flatten only when we are pulling a single-row subquery
up into a FROM (or, equivalently, inner JOIN) node that will still have at
least one remaining relation child.  Per discussion of an example from
Kyotaro Horiguchi.
2015-03-11 23:18:03 -04:00
Tom Lane b746d0c32d Fix old bug in get_loop_count().
While poking at David Kubečka's issue I noticed an ancient logic error
in get_loop_count(): it used 1.0 as a "no data yet" indicator, but since
that is actually a valid rowcount estimate, this doesn't work.  If we
have one input relation with 1.0 as rowcount and then another one with
a larger rowcount, we should use 1.0 as the result, but we picked the
larger rowcount instead.  (I think when I coded this, I recognized the
conflict, but mistakenly thought that the logic would pick the desired
count anyway.)

Fixing this changed the plan for one existing regression test case.
Since the point of that test is to exercise creation of a particular
shape of nestloop plan, I tweaked the query a little bit so it still
results in the same plan choice.

This is definitely a bug, but I'm hesitant to back-patch since it might
change plan choices unexpectedly, and anyway failure to implement a
heuristic precisely as intended is a pretty low-grade bug.
2015-03-11 22:53:32 -04:00
Tom Lane b55722692b Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
	SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x.  Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.

Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s).  However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path().  That
shouldn't make any difference speed-wise and it's really a bit cleaner too.

The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels.  That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly.  In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-11 21:21:00 -04:00
Robert Haas bc93ac12c2 Require non-NULL pstate for all addRangeTableEntryFor* functions.
Per discussion, it's better to have a consistent coding rule here.

Michael Paquier, per a node from Greg Stark referencing an old post
from Tom Lane.
2015-03-11 15:26:43 -04:00
Tom Lane 497bac7d29 Fix long-obsolete code for separating filter conditions in cost_index().
This code relied on pointer equality to identify which restriction clauses
also appear in the indexquals (and, therefore, don't need to be applied as
simple filter conditions).  That was okay once upon a time, years ago,
before we introduced the equivalence-class machinery.  Now there's about a
50-50 chance that an equality clause appearing in the indexquals will be
the mirror image (commutator) of its mate in the restriction list.  When
that happens, we'd erroneously think that the clause would be re-evaluated
at each visited row, and therefore inflate the cost estimate for the
indexscan by the clause's cost.

Add some logic to catch this case.  It seems to me that it continues not to
be worthwhile to expend the extra predicate-proof work that createplan.c
will do on the finally-selected plan, but this case is common enough and
cheap enough to handle that we should do so.

This will make a small difference (about one cpu_operator_cost per row)
in simple cases; but in situations where there's an expensive function in
the indexquals, it can make a very large difference, as seen in recent
example from Jeff Janes.

This is a long-standing bug, but I'm hesitant to back-patch because of the
possibility of destabilizing plan choices that people may be happy with.
2015-03-03 21:19:42 -05:00
Stephen Frost ee4ddcb38a Fix targetRelation initializiation in prepsecurity
In 6f9bd50eab, we modified
expand_security_quals() to tell expand_security_qual() about when the
current RTE was the targetRelation.  Unfortunately, that commit
initialized the targetRelation variable used outside of the loop over
the RTEs instead of at the start of it.

This patch moves the variable and the initialization of it into the
loop, where it should have been to begin with.

Pointed out by Dean Rasheed.

Back-patch to 9.4 as the original commit was.
2015-03-01 15:27:26 -05:00
Tom Lane b514a7460d Fix planning of star-schema-style queries.
Part of the intent of the parameterized-path mechanism was to handle
star-schema queries efficiently, but some overly-restrictive search
limiting logic added in commit e2fa76d80b
prevented such cases from working as desired.  Fix that and add a
regression test about it.  Per gripe from Marc Cousin.

This is arguably a bug rather than a new feature, so back-patch to 9.2
where parameterized paths were introduced.
2015-02-28 12:43:04 -05:00
Stephen Frost 6f9bd50eab Add locking clause for SB views for update/delete
In expand_security_qual(), we were handling locking correctly when a
PlanRowMark existed, but not when we were working with the target
relation (which doesn't have any PlanRowMarks, but the subquery created
for the security barrier quals still needs to lock the rows under it).

Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't
properly issuing a SELECT ... FOR UPDATE to the remote side under a
DELETE.

Back-patch to 9.4 where updatable security barrier views were
introduced.

Per discussion with Etsuro and Dean Rasheed.
2015-02-25 21:36:29 -05:00
Tom Lane c063da1769 Add parse location fields to NullTest and BooleanTest structs.
We did not need a location tag on NullTest or BooleanTest before, because
no error messages referred directly to their locations.  That's planned
to change though, so add these fields in a separate housekeeping commit.

Catversion bump because stored rules may change.
2015-02-22 14:40:27 -05:00
Tom Lane e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
Tom Lane f2874feb7c Some more FLEXIBLE_ARRAY_MEMBER fixes. 2015-02-21 01:46:43 -05:00
Tom Lane abe45a9b31 Fix EXPLAIN output for cases where parent table is excluded by constraints.
The previous coding in EXPLAIN always labeled a ModifyTable node with the
name of the target table affected by its first child plan.  When originally
written, this was necessarily the parent table of the inheritance tree,
so everything was unconfusing.  But when we added NO INHERIT constraints,
it became possible for the parent table to be deleted from the plan by
constraint exclusion while still leaving child tables present.  This led to
the ModifyTable plan node being labeled with the first surviving child,
which was deemed confusing.  Fix it by retaining the parent table's RT
index in a new field in ModifyTable.

Etsuro Fujita, reviewed by Ashutosh Bapat and myself
2015-02-17 18:04:11 -05:00
Tom Lane 1a179f36f7 Fix GEQO to not assume its join order heuristic always works.
Back in commit 400e2c9344 I rewrote GEQO's
gimme_tree function to improve its heuristic for modifying the given tour
into a legal join order.  In what can only be called a fit of hubris,
I supposed that this new heuristic would *always* find a legal join order,
and ripped out the old logic that allowed gimme_tree to sometimes fail.

The folly of this is exposed by bug #12760, in which the "greedy" clumping
behavior of merge_clump() can lead it into a dead end which could only be
recovered from by un-clumping.  We have no code for that and wouldn't know
exactly what to do with it if we did.  Rather than try to improve the
heuristic rules still further, let's just recognize that it *is* a
heuristic and probably must always have failure cases.  So, put back the
code removed in the previous commit to allow for failure (but comment it
a bit better this time).

It's possible that this code was actually fully correct at the time and
has only been broken by the introduction of LATERAL.  But having seen this
example I no longer have much faith in that proposition, so back-patch to
all supported branches.
2015-02-10 20:37:19 -05:00
Tom Lane 75df6dc083 Fix ancient thinko in default table rowcount estimation.
The code used sizeof(ItemPointerData) where sizeof(ItemIdData) is correct,
since we're trying to account for a tuple's line pointer.  Spotted by
Tomonari Katsumata (bug #12584).

Although this mistake is of very long standing, no back-patch, since it's
a relatively harmless error and changing it would risk changing default
planner behavior in stable branches.  (I don't see any change in regression
test outputs here, but the buildfarm may think differently.)
2015-01-18 17:04:11 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Tom Lane 4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Tom Lane 462bd95705 Fix planning of SELECT FOR UPDATE on child table with partial index.
Ordinarily we can omit checking of a WHERE condition that matches a partial
index's condition, when we are using an indexscan on that partial index.
However, in SELECT FOR UPDATE we must include the "redundant" filter
condition in the plan so that it gets checked properly in an EvalPlanQual
recheck.  The planner got this mostly right, but improperly omitted the
filter condition if the index in question was on an inheritance child
table.  In READ COMMITTED mode, this could result in incorrectly returning
just-updated rows that no longer satisfy the filter condition.

The cause of the error is using get_parse_rowmark() when get_plan_rowmark()
is what should be used during planning.  In 9.3 and up, also fix the same
mistake in contrib/postgres_fdw.  It's currently harmless there (for lack
of inheritance support) but wrong is wrong, and the incorrect code might
get copied to someplace where it's more significant.

Report and fix by Kyotaro Horiguchi.  Back-patch to all supported branches.
2014-12-11 21:02:25 -05:00
Tom Lane d25367ec4f Add bms_get_singleton_member(), and use it where appropriate.
This patch adds a function that replaces a bms_membership() test followed
by a bms_singleton_member() call, performing both the test and the
extraction of a singleton set's member in one scan of the bitmapset.
The performance advantage over the old way is probably minimal in current
usage, but it seems worthwhile on notational grounds anyway.

David Rowley
2014-11-28 14:16:24 -05:00
Tom Lane f4e031c662 Add bms_next_member(), and use it where appropriate.
This patch adds a way of iterating through the members of a bitmapset
nondestructively, unlike the old way with bms_first_member().  While
bms_next_member() is very slightly slower than bms_first_member()
(at least for typical-size bitmapsets), eliminating the need to palloc
and pfree a temporary copy of the target bitmapset is a significant win.
So this method should be preferred in all cases where a temporary copy
would be necessary.

Tom Lane, with suggestions from Dean Rasheed and David Rowley
2014-11-28 13:37:25 -05:00
Stephen Frost 143b39c185 Rename pg_rowsecurity -> pg_policy and other fixes
As pointed out by Robert, we should really have named pg_rowsecurity
pg_policy, as the objects stored in that catalog are policies.  This
patch fixes that and updates the column names to start with 'pol' to
match the new catalog name.

The security consideration for COPY with row level security, also
pointed out by Robert, has also been addressed by remembering and
re-checking the OID of the relation initially referenced during COPY
processing, to make sure it hasn't changed under us by the time we
finish planning out the query which has been built.

Robert and Alvaro also commented on missing OCLASS and OBJECT entries
for POLICY (formerly ROWSECURITY or POLICY, depending) in various
places.  This patch fixes that too, which also happens to add the
ability to COMMENT on policies.

In passing, attempt to improve the consistency of messages, comments,
and documentation as well.  This removes various incarnations of
'row-security', 'row-level security', 'Row-security', etc, in favor
of 'policy', 'row level security' or 'row_security' as appropriate.

Happy Thanksgiving!
2014-11-27 01:15:57 -05:00
Tom Lane bac27394a1 Support arrays as input to array_agg() and ARRAY(SELECT ...).
These cases formerly failed with errors about "could not find array type
for data type".  Now they yield arrays of the same element type and one
higher dimension.

The implementation involves creating functions with API similar to the
existing accumArrayResult() family.  I (tgl) also extended the base family
by adding an initArrayResult() function, which allows callers to avoid
special-casing the zero-inputs case if they just want an empty array as
result.  (Not all do, so the previous calling convention remains valid.)
This allowed simplifying some existing code in xml.c and plperl.c.

Ali Akbar, reviewed by Pavel Stehule, significantly modified by me
2014-11-25 12:21:28 -05:00
Tom Lane b62f94c603 Allow simplification of EXISTS() subqueries containing LIMIT.
The locution "EXISTS(SELECT ... LIMIT 1)" seems to be rather common among
people who don't realize that the database already performs optimizations
equivalent to putting LIMIT 1 in the sub-select.  Unfortunately, this was
actually making things worse, because it prevented us from optimizing such
EXISTS clauses into semi or anti joins.  Teach simplify_EXISTS_query() to
suppress constant-positive LIMIT clauses.  That fixes the semi/anti-join
case, and may help marginally even for cases that have to be left as
sub-SELECTs.

Marti Raudsepp, reviewed by David Rowley
2014-11-22 19:12:38 -05:00
Tom Lane 9c58101117 Fix mishandling of system columns in FDW queries.
postgres_fdw would send query conditions involving system columns to the
remote server, even though it makes no effort to ensure that system
columns other than CTID match what the remote side thinks.  tableoid,
in particular, probably won't match and might have some use in queries.
Hence, prevent sending conditions that include non-CTID system columns.

Also, create_foreignscan_plan neglected to check local restriction
conditions while determining whether to set fsSystemCol for a foreign
scan plan node.  This again would bollix the results for queries that
test a foreign table's tableoid.

Back-patch the first fix to 9.3 where postgres_fdw was introduced.
Back-patch the second to 9.2.  The code is probably broken in 9.1 as
well, but the patch doesn't apply cleanly there; given the weak state
of support for FDWs in 9.1, it doesn't seem worth fixing.

Etsuro Fujita, reviewed by Ashutosh Bapat, and somewhat modified by me
2014-11-22 16:01:05 -05:00
Tom Lane 447770404c Rearrange CustomScan API.
Make it work more like FDW plans do: instead of assuming that there are
expressions in a CustomScan plan node that the core code doesn't know
about, insist that all subexpressions that need planner attention be in
a "custom_exprs" list in the Plan representation.  (Of course, the
custom plugin can break the list apart again at executor initialization.)
This lets us revert the parts of the patch that exposed setrefs.c and
subselect.c processing to the outside world.

Also revert the GetSpecialCustomVar stuff in ruleutils.c; that concept
may work in future, but it's far from fully baked right now.
2014-11-21 18:21:46 -05:00