Commit 51ee6f3160 accidentally changed
the behavior around inheritance hierarchies; before, we always
considered parallel paths even for very small inheritance children,
because otherwise an inheritance hierarchy with even one small child
wouldn't be eligible for parallelism. That exception was inadverently
removed; put it back.
In passing, also adjust the degree-of-parallelism comptuation for
index-only scans not to consider the number of heap pages fetched.
Otherwise, we'll avoid parallel index-only scans on tables that are
mostly all-visible, which isn't especially logical.
Robert Haas and Amit Kapila, per a report from Ashutosh Sharma.
Discussion: http://postgr.es/m/CAE9k0PmgSoOHRd60SHu09aRVTHRSs8s6pmyhJKWHxWw9C_x+XA@mail.gmail.com
This logic was adapated from create_merge_append_plan, but the two
cases aren't really analogous, because create_merge_append_plan is not
projection-capable and must therefore have a tlist identical to that
of the underlying paths. Overwriting the tlist of Gather Merge with
whatever the underlying plan happens to produce is no good at all.
Patch by me, reviewed by Rushabh Lathia, who also reported the issue
and made an initial attempt at a fix.
Discussion: http://postgr.es/m/CA+Tgmob_-oHEOBfT9S25bjqokdqv8e8xEmh9zOY+3MPr_LmuhA@mail.gmail.com
Fallout from fcec6caafa2: mark a variable in
set_tablefunc_size_estimates as used for asserts only.
Also, the planner_rte_fetch() call is pointless with assertions
disabled, so enclose it in a USE_ASSERT_CHECKING #ifdef; fix the same
problem in set_subquery_size_estimates().
First problem noted by David Rowley, whose compiler is noisier than mine
in this regard.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers. (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)
This works out to a win if it saves us from having to perform an
expensive Sort. In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't. But if this avoids an expensive sort then it's a win.
Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.
Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
Commit 45be99f8cd took the position
that performing a merge join in parallel was not likely to work out
well, but this conclusion was greeted with skepticism even at the
time. Whether it was true then or not, it's clearly not true any
more now that we have parallel index scan.
Dilip Kumar, reviewed by Amit Kapila and by me.
Discussion: http://postgr.es/m/CAFiTN-v3=cM6nyFwFGp0fmvY4=kk79Hq9Fgu0u8CSJ-EEq1Tiw@mail.gmail.com
Extract the logic used by hash_inner_and_outer into a separate
function, get_cheapest_parallel_safe_total_inner, so that it can
also be used to plan parallel merge joins.
Also, add a require_parallel_safe argument to the existing function
get_cheapest_path_for_pathkeys, because parallel merge join needs
to find the cheapest path for a given set of pathkeys that is
parallel-safe, not just the cheapest one overall.
Patch by me, reviewed by Dilip Kumar.
Discussion: http://postgr.es/m/CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com
When the very cheapest path is not parallel-safe, we want to instead use
the cheapest unparameterized path that is. The old code searched
innerrel->cheapest_parameterized_paths, but that isn't right, because
the path we want may not be in that list. Search innerrel->pathlist
instead.
Spotted by Dilip Kumar.
Discussion: http://postgr.es/m/CAFiTN-szCEcZrQm0i_w4xqSaRUTOUFstNu32Zn4rxxDcoa8gnA@mail.gmail.com
Commit 5262f7a4fc added similar support
for parallel index scans; this extends that work to index-only scans.
As with parallel index scans, this requires support from the index AM,
so currently parallel index-only scans will only be possible for btree
indexes.
Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja,
and Amit Kapila
Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes. This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.
Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
This doesn't do anything to make Param nodes anything other than
parallel-restricted, so this only helps with uncorrelated subplans,
and it's not necessarily very cheap because each worker will run the
subplan separately (just as a Hash Join will build a separate copy of
the hash table in each participating process), but it's a first step
toward supporting cases that are more likely to help in practice, and
is occasionally useful on its own.
Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and
me.
Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
I noticed while hacking on join UNION transforms that planner.c's
function get_base_rel_indexes() just duplicates the functionality of
get_relids_in_jointree(). It doesn't even have the excuse of being
older code :-(. Drop it and use the latter function instead.
If we forcibly place a Material node atop a finished subplan, we need
to move any initPlans attached to the subplan up to the Material node,
in order to keep SS_finalize_plan() happy. I'd figured this out in
commit 7b67a0a49 for the case of materializing a cursor plan, but out of
an abundance of caution, I put the initPlan movement hack at the call
site for that case, rather than inside materialize_finished_plan().
That was the wrong thing, because it turns out to also be necessary for
the only other caller of materialize_finished_plan(), ie subselect.c.
We lacked any test cases that exposed the mistake, but bug#14524 from
Wei Congrui shows that it's possible to get an initPlan reference into
the top tlist in that case too, and then SS_finalize_plan() complains.
Hence, move the hack into materialize_finished_plan().
In HEAD, also relocate some recently-added tests in subselect.sql, which
I'd unthinkingly dropped into the middle of a sequence of related tests.
Report: https://postgr.es/m/20170202060020.1400.89021@wrigleys.postgresql.org
Given a targetlist like "srf(x), f(srf(x))", split_pathtarget_at_srfs()
decided that it needed two levels of ProjectSet nodes, failing to notice
that the two SRF calls are textually equal(). Because of that, setrefs.c
would convert the upper ProjectSet's tlist to "Var1, f(Var1)" (where Var1
represents a reference to the srf(x) output of the lower ProjectSet).
This triggered an assertion in nodeProjectSet.c complaining that it found
no SRFs to evaluate, as reported by Erik Rijkers.
What we want in such a case is to evaluate srf(x) only once and use a plain
Result node to compute "Var1, f(Var1)"; that gives results similar to what
previous versions produced, whereas allowing srf(x) to be evaluated again
in an upper ProjectSet would square the number of rows emitted.
Furthermore, even if the SRF calls aren't textually identical, we want them
to be evaluated in lockstep, because that's what happened in the old
implementation. But split_pathtarget_at_srfs() got this completely wrong,
using two levels of ProjectSet for a case like "srf(x), f(srf(y))".
Hence, rewrite split_pathtarget_at_srfs() from the ground up so that it
groups SRFs according to the depth of nesting of SRFs in their arguments.
This is pretty much how we envisioned that working originally, but I blew
it when it came to implementation.
In passing, optimize the case of target == input_target, which I noticed
is not only possible but quite common.
Discussion: https://postgr.es/m/dcbd2853c05d22088766553d60dc78c6@xs4all.nl
Currently, we only need this logic in order to cost a Bitmap Heap
Scan. But a pending patch for Parallel Bitmap Heap Scan also uses
it to help figure out how many workers to use for the scan, which
has to be determined prior to costing. So, move the logic to
a separate function to make that easier.
Dilip Kumar. The patch series of which this is a part has been
reviewed by Andres Freund, Amit Khendekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, and me; it is not clear from the email
discussion which of those people have looked specifically at this
part.
Discussion: http://postgr.es/m/CAFiTN-v3QYNJEZnnmKCeATuLbN-h9tMVfeEF0+BrouYDqjXgwg@mail.gmail.com
Joining three tables only takes two join nodes. I think when I (tgl)
wrote this, I was envisioning possible additional joins; but since the
example doesn't show any fourth table, it's just confusing to write
a third join node.
Etsuro Fujita
Discussion: https://postgr.es/m/e6cfbaa3-af02-1abc-c25e-8fa5c6bc4e21@lab.ntt.co.jp
I'd somehow talked myself into believing that set_append_rel_size
doesn't need to worry about getting back an AND clause when it applies
eval_const_expressions to the result of adjust_appendrel_attrs (that is,
transposing the appendrel parent's restriction clauses for one child).
But that is nonsense, and Andreas Seltenreich's fuzz tester soon
turned up a counterexample. Put back the make_ands_implicit step
that was there before, and add a regression test covering the case.
Report: https://postgr.es/m/878tq6vja6.fsf@ansel.ydns.eu
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.
This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.
Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
Normally, if we have a WHERE clause like "indexcol = constant",
the planner will figure out that that index column can be ignored
when determining whether the index has a desired sort ordering.
But this failed to work for boolean index columns, because a
condition like "boolcol = true" is canonicalized to just "boolcol"
which does not give rise to an EquivalenceClass. Add a check to
allow the same type of deduction to be made in this case too.
Per a complaint from Dima Pavlov. Arguably this is a bug, but given the
limited impact and the small number of complaints so far, I won't risk
destabilizing plans in stable branches by back-patching.
Patch by me, reviewed by Michael Paquier
Discussion: https://postgr.es/m/1788.1481605684@sss.pgh.pa.us
This patch makes several changes that improve the consistency of
representation of lists of statements. It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list. This patch brings
similar consistency to the outputs of raw parsing and planning steps:
* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.
* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements. In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node. This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.
Now, every list of statements has a consistent head-node type depending
on how far along it is in processing. This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.
Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way. It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)
Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list. While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.
The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement. This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)
There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.
Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes. This allows
more intelligent handling of cases where a source query string contains
multiple statements. This patch doesn't actually do anything with the
information, but a follow-on patch will. (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)
catversion bump because addition of location fields to struct Query
affects stored rules.
This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.
Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
For a partial path, the cardinality estimate needs to reflect the
number of rows we think each worker will see, rather than the total
number of rows; otherwise, costing will go wrong. The previous coding
got this completely wrong for parallel joins.
Unfortunately, this change may destabilize plans for users of 9.6 who
have enabled parallel query, but since 9.6 is still fairly new I'm
hoping expectations won't be too settled yet. Also, this is really a
brown-paper-bag bug, so leaving it unfixed for the entire lifetime of
9.6 seems unwise.
Related reports (whose import I initially failed to recognize) by
Tomas Vondra and Tom Lane.
Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com
RelationGetPartitionQual() and generate_partition_qual() are always
called with recurse = true, so we don't need an argument for that.
Extracted by me from a larger patch by Amit Langote.
This shouldn't change the set of paths that get generated in any
way, but it is preparatory work for further changes to allow a
partial path to be merge-joined witih a non-partial path to produce
a partial join path.
Dilip Kumar, with cosmetic adjustments by me.
This case wasn't thought through sufficiently in commit 100340e2d.
It's true that the FK proves that every outer row has a match in the
inner table, but we forgot that some of the inner rows might be filtered
away by WHERE conditions located within the semijoin's RHS.
If the RHS is just one table, we can reasonably take the semijoin
selectivity as equal to the fraction of the referenced table's rows
that are expected to survive its restriction clauses.
If the RHS is a join, it's not clear how much of the referenced table
might get through the join, so fall back to the same rule we were
already using for other outer-join cases: use the minimum of the
regular per-clause selectivity estimates. This gives the same result
as if we hadn't considered the FK at all when there's a single FK
column, but it should still help for multi-column FKs, which is the
case that 100340e2d is really meant to help with.
Back-patch to 9.6 where the previous commit came in.
Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us
The existing tests in preprocess_minmax_aggregates() usually prevent it
from trying to do anything with queries containing CTEs, but there's an
exception: a CTE could be present as a member of an appendrel, if we
flattened a UNION ALL that contains CTE references. If it did try to
generate an optimized path for a query using a CTE, it failed with
"could not find plan for CTE", as reported by Torsten Förtsch.
The proximate cause is an unwise decision in commit 3fc6e2d7f to clear
subroot->cte_plan_ids in build_minmax_path(). That left the subroot's
cte_plan_ids list out of step with its parse->cteList.
Removing the "subroot->cte_plan_ids = NIL;" assignment is enough to let
the case work again, but really it's pretty silly to be expending any
cycles at all in this module when there are CTEs: we always treat their
outputs as unordered so there's no way for the optimization to win.
Hence, also add an early-exit test so we don't waste time like that.
Back-patch to 9.6 where the misbehavior was introduced.
Report: https://postgr.es/m/CAKkG4_=gjY5QiHtqSZyWMwDuTd_CftKoTaCqxjJ7uUz1-Gw=qw@mail.gmail.com
expandRTE() and get_rte_attribute_type() reported the exprType() and
exprTypmod() values of the expressions in the first row of the VALUES as
being the column type/typmod returned by the VALUES RTE. That's fine for
the data type, since we coerce all expressions in a column to have the same
common type. But we don't coerce them to have a common typmod, so it was
possible for rows after the first one to return values that violate the
claimed column typmod. This leads to the incorrect result seen in bug
#14448 from Hassan Mahmood, as well as some other corner-case misbehaviors.
The desired behavior is the same as we use in other type-unification
cases: report the common typmod if there is one, but otherwise return -1
indicating no particular constraint. It's cheap for transformValuesClause
to determine the common typmod while transforming a multi-row VALUES, but
it'd be less cheap for expandRTE() and get_rte_attribute_type() to
re-determine that info every time they're asked --- possibly a lot less
cheap, if the VALUES has many rows. Therefore, the best fix is to record
the common typmods explicitly in a list in the VALUES RTE, as we were
already doing for column collations. This looks quite a bit like what
we're doing for CTE RTEs, so we can save a little bit of space and code by
unifying the representation for those two RTE types. They both now share
coltypes/coltypmods/colcollations fields. (At some point it might seem
desirable to populate those fields for all RTE types; but right now it
looks like constructing them for other RTE types would add more code and
cycles than it would save.)
The RTE change requires a catversion bump, so this fix is only usable
in HEAD. If we fix this at all in the back branches, the patch will
need to look quite different.
Report: https://postgr.es/m/20161205143037.4377.60754@wrigleys.postgresql.org
Discussion: https://postgr.es/m/27429.1480968538@sss.pgh.pa.us
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
consider_parallel_nestloop passed the wrong jointype down to its
subroutines for JOIN_UNIQUE_INNER cases (it should pass JOIN_INNER), and it
thought that it could pass paths other than innerrel->cheapest_total_path
to create_unique_path, which create_unique_path is not on board with.
These bugs would lead to assertion failures or other errors, suggesting
that this code path hasn't been tested much.
hash_inner_and_outer's code for parallel join effectively treated both
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER the same as JOIN_INNER (for
different reasons :-(), leading to incorrect plans that treated a semijoin
as if it were a plain join.
Michael Day submitted a test case demonstrating that hash_inner_and_outer
failed for JOIN_UNIQUE_OUTER, and I found the other cases through code
review.
Report: https://postgr.es/m/D0E8A029-D1AC-42E8-979A-5DE4A77E4413@rcmail.com
Oversight in my commit 0bb51aa96. Noted while poking at a recent
bug report --- HEAD's estimates for a query using CURRENT_DATE
were unexpectedly much worse than 9.6's.
Andreas Seltenreich found another case where we were being too optimistic
about allowing a plan to be considered parallelizable despite it containing
initPlans. It seems like the real issue here is that if we know we are
going to tack initPlans onto the topmost Plan node for a subquery, we
had better mark that subquery's result Paths as not-parallel-safe. That
fixes this problem and allows reversion of a kluge (added in commit
7b67a0a49 and extended in f24cf960d) to not trust the parallel_safe flag
at top level.
Discussion: <874m2w4k5d.fsf@ex.ansel.ydns.eu>
Fix thinko in commit da1c91631: even if the original query was free of
parallel hazards, we might introduce such a hazard by adding PARAM_EXEC
Param nodes. Adjust is_parallel_safe() so that it will scan the given
expression whenever any such nodes have been created. Per report from
Andreas Seltenreich.
Discussion: <878tse6yvf.fsf@credativ.de>
We mustn't force parallel mode if the query has any subplans, since
ExecSerializePlan doesn't transmit them to workers. Testing
top_plan->initPlan is inadequate because (1) there might be initPlans
attached to lower plan nodes, and (2) non-initPlan subplans don't
work either. There's certainly room for improvement in those
restrictions, but for the moment that's what we've got.
Amit Kapila, per report from Andreas Seltenreich
Discussion: <8737im6pmh.fsf@credativ.de>
create_merge_append_path should use the path rowcount it just computed,
not rel->tuples, for costing purposes. Those numbers should always be
the same at present, but if we ever support parameterized MergeAppend
paths (a case this function is otherwise prepared for), the former would
be right and the latter wrong.
No need for back-patch since the problem is only latent.
Ashutosh Bapat
Discussion: <CAFjFpRek+cLCnTo24youuGtsq4zRphEB8EUUPjDxZjnL4n4HYQ@mail.gmail.com>