The hstore and json datatypes both have record-conversion functions that
pay attention to column names in the composite values they're handed.
We used to not worry about inserting correct field names into tuple
descriptors generated at runtime, but given these examples it seems
useful to do so. Observe the nicer-looking results in the regression
tests whose results changed.
catversion bump because there is a subtle change in requirements for stored
rule parsetrees: RowExprs from ROW() constructs now have to include field
names.
Andrew Dunstan and Tom Lane
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK. This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).
There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag. But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.
KaiGai Kohei, with some wordsmithing by me.
In commit 57664ed25e, I made the planner
wrap non-simple-variable outputs of appendrel children (IOW, child SELECTs
of UNION ALL subqueries) inside PlaceHolderVars, in order to solve some
issues with EquivalenceClass processing. However, this means that any
upper-level WHERE clauses mentioning such outputs will now contain
PlaceHolderVars after they're pushed down into the appendrel child,
and that prevents indxpath.c from recognizing that they could be matched
to index expressions. To fix, add explicit stripping of PlaceHolderVars
from index operands, same as we have long done for RelabelType nodes.
Add a regression test covering both this and the plain-UNION case (which
is a totally different code path, but should also be able to do it).
Per bug #6416 from Matteo Beccati. Back-patch to 9.1, same as the
previous change.
Formerly we passed an empty list to each per-child-table invocation of
grouping_planner, and then merged the results into the global list.
However, that fails if there's a CTE attached to the statement, because
create_ctescan_plan uses the list to find the plan referenced by a CTE
reference; so it was unable to find any CTEs attached to the outer UPDATE
or DELETE. But there's no real reason not to use the same list throughout
the process, and doing so is simpler and faster anyway.
Per report from Josh Berkus of "could not find plan for CTE" failures.
Back-patch to 9.1 where we added support for WITH attached to UPDATE or
DELETE. Add some regression test cases, too.
After the planner was fixed to convert some IN/EXISTS subqueries into
semijoins or antijoins, we had to prevent it from doing that in some
cases where the plans risked getting much worse. The reason the plans
got worse was that in the unoptimized implementation, subqueries could
reference parameters from the outer query at any join level, and so
full table scans could be avoided even if they were one or more levels
of join below where the semi/anti join would be. Now that we have
sufficient mechanism in the planner to handle such cases properly,
it should no longer be necessary to play dumb here.
This reverts commits 07b9936a0f and
cd1f0d04bf. The latter was a stopgap
fix that wasn't really sufficiently analyzed at the time. Rather
than just restricting ourselves to cases where the new join can be
stacked on the right-hand input, we should also consider whether it
can be stacked on the left-hand input.
This patch fixes the planner so that it can generate nestloop-with-
inner-indexscan plans even with one or more levels of joining between
the indexscan and the nestloop join that is supplying the parameter.
The executor was fixed to handle such cases some time ago, but the
planner was not ready. This should improve our plans in many situations
where join ordering restrictions formerly forced complete table scans.
There is probably a fair amount of tuning work yet to be done, because
of various heuristics that have been added to limit the number of
parameterized paths considered. However, we are not going to find out
what needs to be adjusted until the code gets some real-world use, so
it's time to get it in there where it can be tested easily.
Note API change for index AM amcostestimate functions. I'm not aware of
any non-core index AMs, but if there are any, they will need minor
adjustments.
This reverts commit ff68b256a5.
The recent change to use -fexcess-precision=standard should make those
Asserts safe, and does fix a test case that formerly crashed for me,
so I think there's no need to have a cross-version difference in the
code here.
In commit e2c2c2e8b1 I made use of nested
list structures to show which clauses went with which index columns, but
on reflection that's a data structure that only an old-line Lisp hacker
could love. Worse, it adds unnecessary complication to the many places
that don't much care which clauses go with which index columns. Revert
to the previous arrangement of flat lists of clauses, and instead add a
parallel integer list of column numbers. The places that care about the
pairing can chase both lists with forboth(), while the places that don't
care just examine one list the same as before.
The only real downside to this is that there are now two more lists that
need to be passed to amcostestimate functions in case they care about
column matching (which btcostestimate does, so not passing the info is not
an option). Rather than deal with 11-argument amcostestimate functions,
pass just the IndexPath and expect the functions to extract fields from it.
That gets us down to 7 arguments which is better than 11, and it seems
more future-proof against likely additions to the information we keep
about an index path.
It's potentially useful for an index to repeat the same indexable column
or expression in multiple index columns, if the columns have different
opclasses. (If they share opclasses too, the duplicate column is pretty
useless, but nonetheless we've allowed such cases since 9.0.) However,
the planner failed to cope with this, because createplan.c was relying on
simple equal() matching to figure out which index column each index qual
is intended for. We do have that information available upstream in
indxpath.c, though, so the fix is to not flatten the multi-level indexquals
list when putting it into an IndexPath. Then we can rely on the sublist
structure to identify target index columns in createplan.c. There's a
similar issue for index ORDER BYs (the KNNGIST feature), so introduce a
multi-level-list representation for that too. This adds a bit more
representational overhead, but we might more or less buy that back by not
having to search for matching index columns anymore in createplan.c;
likewise btcostestimate saves some cycles.
Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree
index keys must be ordered by attribute" failure shown there, as well as
"operator MMMM is not a member of opfamily NNNN".
Although this is a pre-existing problem that can be demonstrated in 9.0 and
9.1, I'm not going to back-patch it, because the API changes in the planner
seem likely to break things such as index plugins. The corner cases where
this matters seem too narrow to justify possibly breaking things in a minor
release.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view. Views not configured with this
option cannot provide robust row-level security, but will perform far
better.
Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!). Review (in earlier versions) by Noah Misch and
others. Design advice by Tom Lane and myself. Further review and
cleanup by me.
The need for this was debated when we put in the index-only-scan feature,
but at the time we had no near-term expectation of having AMs that could
support such scans for only some indexes; so we kept it simple. However,
the SP-GiST AM forces the issue, so let's fix it.
This patch only installs the new API; no behavior actually changes.
While logically correct, these two Asserts could fail depending on the
vagaries of floating-point arithmetic. In particular, on machines with
floating-point registers wider than standard "double" values, it was
possible for the compiler to compare a rounded-to-double value already
stored in memory with an unrounded long double value still in a register.
Given the preceding checks, these assertions aren't adding much, so let's
just get rid of them rather than try to find a compiler-proof fix.
Per report from Pavel Stehule.
Given the lack of previous complaints, and the fact that only developers
would be likely to trip over it, I'm only going to change this in HEAD,
even though the code has been like this for a long time.
Moving the code two full tab stops to the right requires rethinking of
cosmetic code layout choices, which pgindent isn't really able to do for
us. Whitespace and comment adjustments only, no code changes.
This function has now grown enough cases that a switch seems appropriate.
This results in a measurable speed improvement on some platforms, and
should certainly not hurt. The code's in need of a pgindent run now,
though.
Andres Freund
The EvalPlanQual machinery assumes that whole-row Vars generated for the
outputs of non-table RTEs will be of composite types. However, for the
case where the RTE is a function call returning a scalar type, we were
doing the wrong thing, as a result of sharing code with a parser case
where the function's scalar output is wanted. (Or at least, that's what
that case has done historically; it does seem a bit inconsistent.)
To fix, extend makeWholeRowVar's API so that it can support both use-cases.
This fixes Belinda Cussen's report of crashes during concurrent execution
of UPDATEs involving joins to the result of UNNEST() --- in READ COMMITTED
mode, we'd run the EvalPlanQual machinery after a conflicting row update
commits, and it was expecting to get a HeapTuple not a scalar datum from
the "wholerowN" variable referencing the function RTE.
Back-patch to 9.0 where the current EvalPlanQual implementation appeared.
In 9.1 and up, this patch also fixes failure to attach the correct
collation to the Var generated for a scalar-result case. An example:
regression=# select upper(x.*) from textcat('ab', 'cd') x;
ERROR: could not determine which collation to use for upper() function
Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output
expressions appear non-constant and distinct from each other. This makes
the world safe for add_child_rel_equivalences to do what it does. Before,
it was possible for that function to add identical expressions to different
EquivalenceClasses, which logically should imply merging such ECs, which
would be wrong; or to improperly add a constant to an EquivalenceClass,
drastically changing its behavior. Per report from Teodor Sigaev.
The only currently known consequence of this bug is "MergeAppend child's
targetlist doesn't match MergeAppend" planner failures in 9.1 and later.
I am suspicious that there may be other failure modes that could affect
older release branches; but in the absence of any hard evidence, I'll
refrain from back-patching further than 9.1.
inline_set_returning_function failed to distinguish functions returning
generic RECORD (which require a column list in the RTE, as well as run-time
type checking) from those with multiple OUT parameters (which do not).
This prevented inlining from happening. Per complaint from Jay Levitt.
Back-patch to 8.4 where this capability was introduced.
If we use a PlaceHolderVar from the outer relation in an inner indexscan,
we need to reference the PlaceHolderVar as such as the value to be passed
in from the outer relation. The previous code effectively tried to
reconstruct the PHV from its component expression, which doesn't work since
(a) the Vars therein aren't necessarily bubbled up far enough, and (b) it
would be the wrong semantics anyway because of the possibility that the PHV
is supposed to have gone to null at some point before the current join.
Point (a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers.
Per report from Roger Niederland.
This allows us to give correct syntax error pointers when complaining
about ungrouped variables in a join query with aggregates or GROUP BY.
It's pretty much irrelevant for the planner's use of the function, though
perhaps it might aid debugging sometimes.
If the right-hand side of a semijoin is unique, then we can treat it like a
normal join (or another way to say that is: we don't need to explicitly
unique-ify the data before doing it as a normal join). We were recognizing
such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP
BY decoration, but there's another way: if the RHS is a plain relation with
unique indexes, we can check if any of the indexes prove the output is
unique. Most of the infrastructure for that was there already in the join
removal code, though I had to rearrange it a bit. Per reflection about a
recent example in pgsql-performance.
The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results. Per report from Marti Raudsepp,
though this is not his proposed patch.
Back-patch to 9.0, where both these features were introduced. In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
Add a column pg_class.relallvisible to remember the number of pages that
were all-visible according to the visibility map as of the last VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to estimate
how many heap page fetches can be avoided during an index-only scan.
This is pretty primitive and will no doubt see refinements once we've
acquired more field experience with the index-only scan mechanism, but
it's way better than using a constant.
Note: I had to adjust an underspecified query in the window.sql regression
test, because it was changing answers when the plan changed to use an
index-only scan. Some of the adjacent tests perhaps should be adjusted
as well, but I didn't do that here.
This commit changes index-only scans so that data is read directly from the
index tuple without first generating a faux heap tuple. The only immediate
benefit is that indexes on system columns (such as OID) can be used in
index-only scans, but this is necessary infrastructure if we are ever to
support index-only scans on expression indexes. The executor is now ready
for that, though the planner still needs substantial work to recognize
the possibility.
To do this, Vars in index-only plan nodes have to refer to index columns
not heap columns. I introduced a new special varno, INDEX_VAR, to mark
such Vars to avoid confusion. (In passing, this commit renames the two
existing special varnos to OUTER_VAR and INNER_VAR.) This allows
ruleutils.c to handle them with logic similar to what we use for subplan
reference Vars.
Since index-only scans are now fundamentally different from regular
indexscans so far as their expression subtrees are concerned, I also chose
to change them to have their own plan node type (and hence, their own
executor source file).
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page. This patch depends
on the previous patches that made the visibility map reliable.
There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.
Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
If an indexable operator for a non-collatable indexed datatype has a
collatable right-hand input type, any OpExpr for it will be marked with a
nonzero inputcollid (since having one collatable input is sufficient to
make that happen). However, an index on a non-collatable column certainly
doesn't have any collation. This caused us to fail to match such operators
to their indexes, because indxpath.c required an exact match of index
collation and clause collation. It seems correct to allow a match when the
index is collation-less regardless of the clause's inputcollid: an operator
with both noncollatable and collatable inputs could perhaps depend on the
collation of the collatable input, but it could hardly expect the index for
the noncollatable input to have that same collation.
Per bug #6232 from Pierre Ducroquet. His example is specifically about
"hstore ? text" but the problem seems quite generic.
In commit c1d9579dd8, I changed things so
that the output of the Agg node that feeds the window functions would not
list any ungrouped Vars directly. Formerly, for example, the Agg tlist
might have included both "x" and "sum(x)", which is not really valid if
"x" isn't a grouping column. If we then had a window function ordering on
something like "sum(x) + 1", prepare_sort_from_pathkeys would find no exact
match for this in the Agg tlist, and would conclude that it must recompute
the expression. But it would break the expression down to just the Var
"x", which it would find in the tlist, and then rebuild the ORDER BY
expression using a reference to the subplan's "x" output. Now, after the
above-referenced changes, "x" isn't in the Agg tlist if it's not a grouping
column, so that prepare_sort_from_pathkeys fails with "could not find
pathkey item to sort", as reported by Bricklen Anderson.
The fix is to not break down Aggrefs into their component parts, but just
treat them as irreducible expressions to be sought in the subplan tlist.
This is definitely OK for the use with respect to window functions in
grouping_planner, since it just built the tlist being used on the same
basis. AFAICT it is safe for other uses too; most of the other call sites
couldn't encounter Aggrefs anyway.
The constraint exclusion feature checks for contradictions among scan
restriction clauses, as well as contradictions between those clauses and a
table's CHECK constraints. The first aspect of this testing can be useful
for non-table relations (such as subqueries or functions-in-FROM), but the
feature was coded with only the CHECK case in mind so we were applying it
only to plain-table RTEs. Move the relation_excluded_by_constraints call
so that it is applied to all RTEs not just plain tables. With the default
setting of constraint_exclusion this results in no extra work, but with
constraint_exclusion = ON we will detect optimizations that we missed
before (at the cost of more planner cycles than we expended before).
Per a gripe from Gunnlaugur Þór Briem. Experimentation with
his example also showed we were not being very bright about the case where
constraint exclusion is proven within a subquery within UNION ALL, so tweak
the code to allow set_append_rel_pathlist to recognize such cases.
walsender.h should depend on xlog.h, not vice versa. (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.) Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h. Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.
This episode reinforces my feeling that pgrminclude should not be run
without adult supervision. Inclusion changes in header files in particular
need to be reviewed with great care. More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
Formerly, set_subquery_pathlist and other creators of plans for subqueries
saved only the rangetable and rowMarks lists from the lower-level
PlannerInfo. But there's no reason not to remember the whole PlannerInfo,
and indeed this turns out to simplify matters in a number of places.
The immediate reason for doing this was so that the subroot will still be
accessible when we're trying to extract column statistics out of an
already-planned subquery. But now that I've done it, it seems like a good
code-beautification effort in its own right.
I also chose to get rid of the transient subrtable and subrowmark fields in
SubqueryScan nodes, in favor of having setrefs.c look up the subquery's
RelOptInfo. That required changing all the APIs in setrefs.c to pass
PlannerInfo not PlannerGlobal, which was a large but quite mechanical
transformation.
One side-effect not foreseen at the beginning is that this finally broke
inheritance_planner's assumption that replanning the same subquery RTE N
times would necessarily give interchangeable results each time. That
assumption was always pretty risky, but now we really have to make a
separate RTE for each instance so that there's a place to carry the
separate subroots.
set_append_rel_pathlist supposed that, while computing per-column width
estimates for the appendrel, it could ignore child rels for which the
translated reltargetlist entry wasn't a Var. This gave rise to completely
silly estimates in some common cases, such as constant outputs from some or
all of the arms of a UNION ALL. Instead, fall back on get_typavgwidth to
estimate from the value's datatype; which might be a poor estimate but at
least it's not completely wacko.
That problem was exposed by an Assert in set_subquery_size_estimates, which
unfortunately was still overoptimistic even with that fix, since we don't
compute attr_widths estimates for appendrels that are entirely excluded by
constraints. So remove the Assert; we'll just fall back on get_typavgwidth
in such cases.
Also, since set_subquery_size_estimates calls set_baserel_size_estimates
which calls set_rel_width, there's no need for set_subquery_size_estimates
to call get_typavgwidth; set_rel_width will handle it for us if we just
leave the estimate set to zero. Remove the unnecessary code.
Per report from Erik Rijkers and subsequent investigation.
This requires adjusting the API for syscache callback functions: they now
get a hash value, not a TID, to identify the target tuple. Most of them
weren't paying any attention to that argument anyway, but plancache did
require a small amount of fixing.
Also, improve performance a trifle by avoiding sending duplicate inval
messages when a heap_update isn't changing the catcache lookup columns.
Such a construction is useless since the lower PlaceHolderVar is already
nullable; no need to make it more so. Noted while pursuing bug #6154.
This is just a minor planner efficiency improvement, since the final plan
will come out the same anyway after PHVs are flattened. So not worth the
risk of back-patching.
A PlaceHolderVar's expression might contain another, lower-level
PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one
certainly will be also, and so we have to make sure that both of them get
into the placeholder_list with correct ph_may_need values during the
initial pre-scan of the query (before deconstruct_jointree starts).
We did this correctly for PlaceHolderVars appearing in the query quals,
but overlooked the issue for those appearing in the top-level targetlist;
with the result that nested placeholders referenced only in the targetlist
did not work correctly, as illustrated in bug #6154.
While at it, add some error checking to find_placeholder_info to ensure
that we don't try to create new placeholders after it's too late to do so;
they have to all be created before deconstruct_jointree starts.
Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
glibc renders random() thread-safe by wrapping a futex lock around it;
testing reveals that this limits the performance of pgbench on machines
with many CPU cores. Rather than switching to random_r(), which is
only available on GNU systems and crashes unless you use undocumented
alchemy to initialize the random state properly, switch to our built-in
implementation of erand48(), which is both thread-safe and concurrent.
Since the list of reasons not to use the operating system's erand48()
is getting rather long, rename ours to pg_erand48() (and similarly
for our implementations of lrand48() and srand48()) and just always
use those. We were already doing this on Cygwin anyway, and the
glibc implementation is not quite thread-safe, so pgbench wouldn't
be able to use that either.
Per discussion with Tom Lane.
If a Var was used only in a GROUP BY expression, the previous
implementation would include the Var by itself (as well as the expression)
in the generated targetlist. This wouldn't affect the efficiency of the
scan/join part of the plan at all, but it could result in passing
unnecessarily-wide rows through sorting and grouping steps. It turns out
to take only a little more code, and not noticeably more time, to generate
a tlist without such redundancy, so let's do that. Per a recent gripe from
HarmeekSingh Bedi.
There's a heuristic in estimate_rel_size() to clamp the minimum size
estimate for a table to 10 pages, unless we can see that vacuum or analyze
has been run (and set relpages to something nonzero, so this will always
happen for a table that's actually empty). However, it would be better
not to do this for inheritance parent tables, which very commonly are
really empty and can be expected to stay that way. Per discussion of a
recent pgsql-performance report from Anish Kejariwal. Also prevent it
from happening for indexes (although this is more in the nature of
documentation, since CREATE INDEX normally initializes relpages to
something nonzero anyway).
Back-patch to 9.0, because the ability to collect statistics across a
whole inheritance tree has improved the planner's estimates to the point
where this relatively small error makes a significant difference. In the
referenced report, merge or hash joins were incorrectly estimated as
cheaper than a nestloop with inner indexscan on the inherited table.
That was less likely before 9.0 because the lack of inherited stats would
have resulted in a default (and rather pessimistic) estimate of the cost
of a merge or hash join.
Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.
Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.
Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.
get_op_btree_interpretation assumed this in order to save some duplication
of code, but it's not true in general anymore because we added <> support
to btree_gist. (We still assume it for btree opclasses, though.)
Also, essentially the same logic was baked into predtest.c. Get rid of
that duplication by generalizing get_op_btree_interpretation so that it
can be used by predtest.c.
Per bug report from Denis de Bernardy and investigation by Jeff Davis,
though I didn't use Jeff's patch exactly as-is.
Back-patch to 9.1; we do not support this usage before that.
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.
An non-validated CHECK constraint is ignored in the planner for
constraint_exclusion purposes; when validated, cached plans are
recomputed so that partitioning starts working right away.
This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.
Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various
reviews, and Robert Hass for documentation wording improvement
suggestions.
This patch was sponsored by Enova Financial.
Initially, we use this only to eliminate calls to the varchar()
function in cases where the length is not being reduced and, therefore,
the function call is equivalent to a RelabelType operation. The most
significant effect of this is that we can avoid a table rewrite when
changing a varchar(X) column to a varchar(Y) column, where Y > X.
Noah Misch, reviewed by me and Alexey Klyukin
When recursing after an optimization in pull_up_sublinks_qual_recurse, the
available_rels value passed down must include only the relations that are
in the righthand side of the new SEMI or ANTI join; it's incorrect to pull
up a sub-select that refers to other relations, as seen in the added test
case. Per report from BangarRaju Vadapalli.
While at it, rethink the idea of recursing below a NOT EXISTS. That is
essentially the same situation as pulling up ANY/EXISTS sub-selects that
are in the ON clause of an outer join, and it has the same disadvantage:
we'd force the two joins to be evaluated according to the syntactic nesting
order, because the lower join will most likely not be able to commute with
the ANTI join. That could result in having to form a rather large join
product, whereas the handling of a correlated subselect is not quite that
dumb. So until we can handle those cases better, #ifdef NOT_USED that
case. (I think it's okay to pull up in the EXISTS/ANY cases, because SEMI
joins aren't so inflexible about ordering.)
Back-patch to 8.4, same as for previous patch in this area. Fortunately
that patch hadn't made it into any shipped releases yet.
This has never been supported, but we previously let md.c issue the
complaint for us at whatever point we tried to examine the backing file.
Now we print a nicer error message.
Per bug #6041, reported by Emanuel, and extensive discussion with Tom
Lane over where to put the check.
The existence of a btree opclass accepting composite types caused us to
assume that every composite type is sortable. This isn't true of course;
we need to check if the column types are all sortable. There was logic
for this for the case of array comparison (ie, check that the element
type is sortable), but we missed the point for rowtypes. Per Teodor's
report of an ANALYZE failure for an unsortable composite type.
Rather than just add some more ad-hoc logic for this, I moved knowledge of
the issue into typcache.c. The typcache will now only report out array_eq,
record_cmp, and friends as usable operators if the array or composite type
will work with those functions.
Unfortunately we don't have enough info to do this for anonymous RECORD
types; in that case, just assume it will work, and take the runtime failure
as before if it doesn't.
This patch might be a candidate for back-patching at some point, but
given the lack of complaints from the field, I'd rather just test it in
HEAD for now.
Note: most of the places touched in this patch will need further work
when we get around to supporting hashing of record types.
After finding an EXISTS or ANY sub-select that can be converted to a
semi-join or anti-join, we should recurse into the body of the sub-select.
This allows cases such as EXISTS-within-EXISTS to be optimized properly.
The original coding would leave the lower sub-select as a SubLink, which
is no better and often worse than what we can do with a join. Per example
from Wayne Conrad.
Back-patch to 8.4. There is a related issue in older versions' handling
of pull_up_IN_clauses, but they're lame enough anyway about the whole area
that it seems not worth the extra work to try to fix.
The previous coding failed to account properly for the costs of evaluating
the input expressions of aggregates and window functions, as seen in a
recent gripe from Claudio Freire. (I said at the time that it wasn't
counting these costs at all; but on closer inspection, it was effectively
charging these costs once per output tuple. That is completely wrong for
aggregates, and not exactly right for window functions either.)
There was also a hard-wired assumption that aggregates and window functions
had procost 1.0, which is now fixed to respect the actual cataloged costs.
The costing of WindowAgg is still pretty bogus, since it doesn't try to
estimate the effects of spilling data to disk, but that seems like a
separate issue.
This patch is almost entirely cosmetic --- mostly cleaning up a lot of
neglected comments, and fixing code layout problems in places where the
patch made lines too long and then pgindent did weird things with that.
I did find a bug-of-omission in equalTupleDescs().
The original coding assumed that such a case represents caller error, but
actually get_relation_info will omit generating an IndexOptInfo for any
index it thinks is unsafe to use. Therefore, handle this case by returning
"true" to indicate that a seqscan-and-sort is the preferred way to
implement the CLUSTER operation. New bug in 9.1, no backpatch needed.
Per bug #5985 from Daniel Grace.
Per spec we ought to apply select_common_collation() across the expressions
in each column of the VALUES table. The original coding was just taking
the first row and assuming it was representative.
This patch adds a field to struct RangeTblEntry to carry the resolved
collations, so initdb is forced for changes in stored rule representation.
This area was a few bricks shy of a load, and badly under-commented too.
We have to ensure that the generated targetlist entries for a set-operation
node expose the correct collation for each entry, since higher-level
processing expects the tlist to reflect the true ordering of the plan's
output.
This hackery wouldn't be necessary if SortGroupClause carried collation
info ... but making it do so would inject more pain in the parser than
would be saved here. Still, we might want to rethink that sometime.
Although rowcount estimates really ought not be NaN, a bug elsewhere
could perhaps result in that, and that would cause Assert failure in
cost_mergejoin, which I believe to be the explanation for bug #5977 from
Anton Kuznetsov. Seems like a good idea to expend a couple more cycles
to prevent that, even though the real bug is elsewhere. Not back-patching,
though, because we don't encourage running production systems with
Asserts on.
When we are doing GEQO join planning, the current memory context is a
short-lived context that will be reset at the end of geqo_eval(). However,
the RelOptInfos for base relations are set up before that and then re-used
across many GEQO cycles. Hence, any code that modifies a baserel during
join planning has to be careful not to put pointers to the short-lived
context into the baserel struct. mark_dummy_rel got this wrong, leading to
easy-to-reproduce-once-you-know-how crashes in 8.4, as reported off-list by
Leo Carson of SDSC. Some improvements made in 9.0 make it difficult to
demonstrate the crash in 9.0 or HEAD; but there's no doubt that there's
still a risk factor here, so patch all branches that have the function.
(Note: 8.3 has a similar function, but it's only applied to joinrels and
thus is not a hazard.)
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings. Fix by passing it in FunctionCallInfoData
instead. In particular this allows a clean fix for bug #5970 (record_cmp
not working). This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
This warning is new in gcc 4.6 and part of -Wall. This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.
This is necessary, not optional, now that ILIKE and regexes are collation
aware --- else we might derive a wrong comparison constant for index
optimized pattern matches.
Get rid of bogus collation test in match_special_index_operator (even for
ILIKE, the pattern match operator's collation doesn't matter here, and even
if it did the test was testing the wrong thing).
Fix broken looping logic in expand_indexqual_rowcompare.
Add collation check in match_clause_to_ordering_op.
Make naming and argument ordering more consistent; improve comments.
I'm not sure these have any non-cosmetic implications, but I'm not sure
they don't, either. In particular, ensure the CaseTestExpr generated
by transformAssignmentIndirection to represent the base target column
carries the correct collation, because parse_collate.c won't fix that.
Tweak lsyscache.c API so that we can get the appropriate collation
without an extra syscache lookup.
In nearly all cases, the caller already knows the correct collation, and
in a number of places, the value the caller has handy is more correct than
the default for the type would be. (In particular, this patch makes it
significantly less likely that eval_const_expressions will result in
changing the exposed collation of an expression.) So an internal lookup
is both expensive and wrong.
Ensure that parameter symbols receive collation from the function's
resolved input collation, and fix inlining to behave properly.
BTW, this commit lays about 90% of the infrastructure needed to support
use of argument names in SQL functions. Parsing of parameters is now
done via the parser-hook infrastructure ... we'd just need to supply
a column-ref hook ...
Instead of playing cute games with pathkeys, just build a direct
representation of the intended sub-select, and feed it through
query_planner to get a Path for the index access. This is a bit slower
than 9.1's previous method, since we'll duplicate most of the overhead of
query_planner; but since the whole optimization only applies to rather
simple single-table queries, that probably won't be much of a problem in
practice. The advantage is that we get to do the right thing when there's
a partial index that needs the implicit IS NOT NULL clause to be usable.
Also, although this makes planagg.c be a bit more closely tied to the
ordering of operations in grouping_planner, we can get rid of some coupling
to lower-level parts of the planner. Per complaint from Marti Raudsepp.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record). Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.
Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree. This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.
Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
While this will give wrong answers when estimating selectivity for a
comparison operator that's using a non-default collation, the estimation
error probably won't be large; and anyway the former approach created
estimation errors of its own by trying to use a histogram that might have
been computed with some other collation. So we'll adopt this simplified
approach for now and perhaps improve it sometime in the future.
This patch incorporates changes from Andres Freund to make sure that
selfuncs.c passes a valid collation OID to any datatype-specific function
it calls, in case that function wants collation information. Said OID will
now always be DEFAULT_COLLATION_OID, but at least we won't get errors.
CollateClause is now used only in raw grammar output, and CollateExpr after
parse analysis. This is for clarity and to avoid carrying collation names
in post-analysis parse trees: that's both wasteful and possibly misleading,
since the collation's name could be changed while the parsetree still
exists.
Also, clean up assorted infelicities and omissions in processing of the
node type.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order. Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values. And attempts to do conflicting updates will
have unpredictable results. We'll need to document all that.
This commit just fixes the code; documentation updates are waiting on
author.
Marko Tiikkaja and Hitoshi Harada
The recent additions for FDW support required checking foreign-table-ness
in several places in the parse/plan chain. While it's not clear whether
that would really result in a noticeable slowdown, it seems best to avoid
any performance risk by keeping a copy of the relation's relkind in
RangeTblEntry. That might have some other uses later, anyway.
Per discussion.
This commit provides the core code and documentation needed. A contrib
module test case will follow shortly.
Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
That function was supposing that indexoid == 0 for a hypothetical index,
but that is not likely to be true in any non-toy implementation of an index
adviser, since assigning a fake OID is the only way to know at EXPLAIN time
which hypothetical index got selected. Fix by adding a flag to
IndexOptInfo to mark hypothetical indexes. Back-patch to 9.0 where
get_actual_variable_range() was added.
Gurjeet Singh
Flattening of subquery range tables during setrefs.c could lead to the
rangetable indexes in PlanRowMark nodes not matching up with the column
names previously assigned to the corresponding resjunk ctid (resp. tableoid
or wholerow) columns. Typical symptom would be either a "cannot extract
system attribute from virtual tuple" error or an Assert failure. This
wasn't a problem before 9.0 because we didn't support FOR UPDATE below the
top query level, and so the final flattening could never renumber an RTE
that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique
number for each PlanRowMark to label the associated resjunk columns, so
that the number need not change during flattening.
Per report from David Johnston (though I'm darned if I can see how this got
past initial testing of the relevant code). Back-patch to 9.0.
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
reduce_outer_joins() mistakenly treated a semijoin like a left join for
purposes of deciding whether not-null constraints created by the join's
quals could be passed down into the join's left-hand side (possibly
resulting in outer-join simplification there). Actually, semijoin works
like inner join for this purpose, ie, we do not need to see any rows that
can't possibly satisfy the quals. Hence, two-line fix to treat semi and
inner joins alike. Per observation by Andres Freund about a performance
gripe from Yazan Suleiman.
Back-patch to 8.4, since this oversight has been there since the current
handling of semijoins was implemented.
This reverts commit d1001a78ce of 2010-12-05,
which was broken as reported by Jeff Davis. The problem is that the
individual planning steps may have side-effects on substructures of
PlannerGlobal, not only the current PlannerInfo root. Arranging to keep
all such side effects in the main planning context is probably possible,
but it would change this from a quick local hack into a wide-ranging and
rather fragile endeavor. Which it's not worth.
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets. This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan. The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon. Revise the data structure
so that we track resjunk column numbers separately for each subplan.
I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan. That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
Per my note of a couple days ago, create_index_paths would refuse to
consider any path at all for GIN indexes if the selectivity estimate came
out as 1.0; not even if you tried to force it with enable_seqscan. While
this isn't really a bad outcome in practice, it could be annoying for
testing purposes. Adjust the test for "is this path only useful for
sorting" so that it doesn't fire on paths with nil pathkeys, which will
include all GIN paths.
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join. For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition. (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)
To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple. The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple. So we aren't increasing the size of the hashtable at all for
the feature.
To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin. Other than that decision, it was pretty
straightforward.
eval_const_expressions() can replace CaseTestExprs with constants when
the surrounding CASE's test expression is a constant. This confuses
ruleutils.c's heuristic for deparsing simple-form CASEs, leading to
Assert failures or "unexpected CASE WHEN clause" errors. I had put in
a hack solution for that years ago (see commit
514ce7a331 of 2006-10-01), but bug #5794
from Peter Speck shows that that solution failed to cover all cases.
Fortunately, there's a much better way, which came to me upon reflecting
that Peter's "CASE TRUE WHEN" seemed pretty redundant: we can "simplify"
the simple-form CASE to the general form of CASE, by simply omitting the
constant test expression from the rebuilt CASE construct. This is
intuitively valid because there is no need for the executor to evaluate
the test expression at runtime; it will never be referenced, because any
CaseTestExprs that would have referenced it are now replaced by constants.
This won't save a whole lot of cycles, since evaluating a Const is pretty
cheap, but a cycle saved is a cycle earned. In any case it beats kluging
ruleutils.c still further. So this patch improves const-simplification
and reverts the previous change in ruleutils.c.
Back-patch to all supported branches. The bug exists in 8.1 too, but it's
out of warranty.
Avoid eating quite so much memory for large inheritance trees, by
reclaiming the space used by temporary copies of the original parsetree and
range table, as well as the workspace needed during planning. The cost is
needing to copy the finished plan trees out of the child memory context.
Although this looks like it ought to slow things down, my testing shows
it actually is faster, apparently because fewer interactions with malloc()
are needed and/or we can do the work within a more readily cacheable amount
of memory. That result might be platform-dependent, but I'll take it.
Per a gripe from John Papandriopoulos, in which it was pointed out that the
memory consumption actually grew as O(N^2) for sufficiently many child
tables, since we were creating N copies of the N-element range table.
This is a heavily revised version of builtin_knngist_core-0.9. The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.
Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too. Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values. AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.
Teodor Sigaev and Tom Lane
There were corner cases in which the planner would attempt to inline such
a function, which would result in a failure at runtime due to loss of
information about exactly what the result record type is. Fix by disabling
inlining when the function's recorded result type is RECORD. There might
be some sub-cases where inlining could still be allowed, but this is a
simple and backpatchable fix, so leave refinements for another day.
Per bug #5777 from Nate Carson.
Back-patch to all supported branches. 8.1 happens to avoid a core-dump
here, but it still does the wrong thing.
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order. This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly. That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.
I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.
There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit. I'll look at the remainder of that
work after the current commitfest.
This commit adds columns amoppurpose and amopsortfamily to pg_amop, and
column amcanorderbyop to pg_am. For the moment all the entries in
amcanorderbyop are "false", since the underlying support isn't there yet.
Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with
[ FOR SEARCH | FOR ORDER BY sort_operator_family ] clauses to allow the new
columns of pg_amop to be populated, and create pg_dump support for dumping
that information.
I also added some documentation, although it's perhaps a bit premature
given that the feature doesn't do anything useful yet.
Teodor Sigaev, Robert Haas, Tom Lane
We no longer need the terminating zero entry in opfamily[], so get rid of
it. Also replace assorted ad-hoc looping logic with simple for and foreach
constructs. This code is now noticeably more readable than it was an hour
ago; credit to Robert for seeing that it could be simplified.
As per the ancient comment for set_rel_width, it really wasn't much good
for relations that aren't plain tables: it would never find any stats and
would always fall back on datatype-based estimates, which are often pretty
silly. Fix that by copying up width estimates from the subquery planning
process.
At some point we might want to do this for CTEs too, but that would be a
significantly more invasive patch because the sub-PlannerInfo is no longer
accessible by the time it's needed. I refrained from doing anything about
that, partly for fear of breaking the unmerged CTE-related patches.
In passing, also generate less bogus width estimates for whole-row Vars.
Per a gripe from Jon Nelson.
Fix things so that top-N sorting can be used in child Sort nodes of a
MergeAppend node, when there is a LIMIT and no intervening joins or
grouping. Actually doing this on the executor side isn't too bad,
but it's a bit messier to get the planner to cost it properly.
Per gripe from Robert Haas.
In passing, fix an oversight in the original top-N-sorting patch:
query_planner should not assume that a LIMIT can be used to make an
explicit sort cheaper when there will be grouping or aggregation in
between. Possibly this should be back-patched, but I'm not sure the
mistake is serious enough to be a real problem in practice.
Once we have found a non-null constant argument, there is no need to
examine additional arguments of the COALESCE. The previous coding got it
right only if the constant was in the first argument position; otherwise
it tried to simplify following arguments too, leading to unexpected
behavior like this:
regression=# select coalesce(f1, 42, 1/0) from int4_tbl;
ERROR: division by zero
It's a minor corner case, but a bug is a bug, so back-patch all the way.
Formerly, we could convert a UNION ALL structure inside a subquery-in-FROM
into an appendrel, as a side effect of pulling up the subquery into its
parent; but top-level UNION ALL always caused use of plan_set_operations().
That didn't matter too much because you got an Append-based plan either
way. However, now that the appendrel code can do things with MergeAppend,
it's worthwhile to hack up the top-level case so it also uses appendrels.
This is a bit of a stopgap; but going much further than this will require
a major rewrite of the planner's set-operations support, which I'm not
prepared to undertake now. For the moment let's grab the low-hanging fruit.
Per my recent proposal, get rid of all the direct inspection of indexes
and manual generation of paths in planagg.c. Instead, set up
EquivalenceClasses for the aggregate argument expressions, and let the
regular path generation logic deal with creating paths that can satisfy
those sort orders. This makes planagg.c a bit more visible to the rest
of the planner than it was originally, but the approach is basically a lot
cleaner than before. A major advantage of doing it this way is that we get
MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans)
practically for free, whereas in the old way we'd have had to add a whole
lot more duplicative logic.
One small disadvantage of this approach is that MIN/MAX aggregates can no
longer exploit partial indexes having an "x IS NOT NULL" predicate, unless
that restriction or something that implies it is specified in the query.
The previous implementation was able to use the added "x IS NOT NULL"
condition as an extra predicate proof condition, but in this version we
rely entirely on indexes that are considered usable by the main planning
process. That seems a fair tradeoff for the simplicity and functionality
gained.
It was reporting that these were fully indexed (hence cheap), when of
course they're the exact opposite of that. I'm not certain if the case
would arise in practice, since a clauseless semijoin is hard to produce
in SQL, but if it did happen we'd make some dumb decisions.
The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.
In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting. This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause. The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow. Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.
The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
Now that we're expecting a mergeclause's left_ec/right_ec to persist from
the initial assignments, we can't just blithely zero these out when
transforming such a clause in adjust_appendrel_attrs. But really it should
be okay to keep the parent's values, since a child table's derived Var
ought to be equivalent to the parent Var for all EquivalenceClass purposes.
(Indeed, I'm wondering whether we couldn't find a way to dispense with
add_child_rel_equivalences altogether. But this is wrong in any case.)
Zoltan Boszormenyi exhibited a test case in which planning time was
dominated by construction of EquivalenceClasses and PathKeys that had no
actual relevance to the query (and in fact got discarded immediately).
This happened because we generated PathKeys describing the sort ordering of
every index on every table in the query, and only after that checked to see
if the sort ordering was relevant. The EC/PK construction code is O(N^2)
in the number of ECs, which is all right for the intended number of such
objects, but it gets out of hand if there are ECs for lots of irrelevant
indexes.
To fix, twiddle the handling of mergeclauses a little bit to ensure that
every interesting EC is created before we begin path generation. (This
doesn't cost anything --- in fact I think it's a bit cheaper than before
--- since we always eventually created those ECs anyway.) Then, if an
index column can't be found in any pre-existing EC, we know that that sort
ordering is irrelevant for the query. Instead of creating a useless EC,
we can just not build a pathkey for the index column in the first place.
The index will still be considered if it's useful for non-order-related
reasons, but we will think of its output as unsorted.
This avoids a possible crash when inlining a SRF whose argument list
contains a reference to an inline-able user function. The crash is quite
reproducible with CLOBBER_FREED_MEMORY enabled, but would be less certain
in a production build. Problem introduced in 9.0 by the named-arguments
patch, which requires invoking eval_const_expressions() before we can try
to inline a SRF. Per report from Brendan Jurd.
A couple of places in the planner need to generate whole-row Vars, and were
cutting corners by setting vartype = RECORDOID in the Vars, even in cases
where there's an identifiable named composite type for the RTE being
referenced. While we mostly got away with this, it failed when there was
also a parser-generated whole-row reference to the same RTE, because the
two Vars weren't equal() due to the difference in vartype. Fix by
providing a subroutine the planner can call to generate whole-row Vars
the same way the parser does.
Per bug #5716 from Andrew Tipton. Back-patch to 9.0 where one of the bogus
calls was introduced (the other one is new in HEAD).
This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted. This should be
particularly useful for fast-start cases such as queries with LIMIT.
Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.
This patch merges the responsibility for NOT-flattening into
eval_const_expressions' processing. It wasn't done that way originally
because prepqual.c is far older than eval_const_expressions. But putting
this work into eval_const_expressions saves one pass over the qual trees,
and in fact saves even more than that because we can exploit the knowledge
that the subexpressions have already been recursively simplified. Doing it
this way also lets us do it uniformly over all expressions, whereas
prepqual.c formerly just did it at top level to save cycles. That should
improve the planner's ability to recognize logically-equivalent constructs.
While at it, also add the ability to fold a NOT into BooleanTest and
NullTest constructs (the latter only for the scalar-datatype case).
Per discussion of bug #5702.
This patch adds the SQL-standard concept of an INSTEAD OF trigger, which
is fired instead of performing a physical insert/update/delete. The
trigger function is passed the entire old and/or new rows of the view,
and must figure out what to do to the underlying tables to implement
the update. So this feature can be used to implement updatable views
using trigger programming style rather than rule hacking.
In passing, this patch corrects the names of some columns in the
information_schema.triggers view. It seems the SQL committee renamed
them somewhere between SQL:99 and SQL:2003.
Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.
The point of a PlaceHolderVar is to allow a non-strict expression to be
evaluated below an outer join, after which its value bubbles up like a Var
and can be forced to NULL when the outer join's semantics require that.
However, there was a serious design oversight in that, namely that we
didn't ensure that there was actually a correct place in the plan tree
to evaluate the placeholder :-(. It may be necessary to delay evaluation
of an outer join to ensure that a placeholder that should be evaluated
below the join can be evaluated there. Per recent bug report from Kirill
Simonov.
Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
The previous coding would decide that join removal was unsafe upon finding
a PlaceHolderVar that needed to be evaluated at the inner rel and then used
above the join. However, this fails to cover the case of PlaceHolderVars
that refer to both the inner rel and some other rels. Per bug report from
Andrus.
In some situations the original coding led to corrupting the child AppendRel's
subpaths list, effectively adding other members of the parent's list to it.
This was usually masked because we never made any further use of the child's
list, but given the right combination of circumstances, we could do so. The
visible symptom would be a relation getting scanned twice, as in bug #5673
from David Schmitt.
Backpatch to 8.2, which is as far back as the risky coding appears. The
example submitted by David only fails in 8.4 and later, but I'm not convinced
that there aren't any even-more-obscure cases where 8.2 and 8.3 would fail.
Poking around for remaining occurrences of CVS keyword strings, I came
across one that apparently reflects the use of a $Revision: ...$ string
in the original input data. Dunno why anybody would be using that in
an MTA's Received: lines, but there it is. Put it back to the way that
it was originally, according to inspection of the CVS repo.
In these cases a qual can get marked with the removable rel in its
required_relids, but this is just to schedule its evaluation correctly, not
because it really depends on the rel. We were assuming that, in effect,
we could throw away *all* quals so marked, which is nonsense. Tighten up
the logic to be a little more paranoid about which quals belong to the
outer join being considered for removal, and arrange for all quals that
don't belong to be updated so they will still get evaluated correctly.
Also fix another problem that happened to be exposed by this test case,
which was that make_join_rel() was failing to notice some cases where
a constant-false qual could be used to prove a join relation empty. If it's
a pushed-down constant false, then the relation is empty even if it's an
outer join, because the qual applies after the outer join expansion.
Per report from Nathan Grange. Back-patch into 9.0.
used by array_agg(), string_agg(), and similar aggregate functions that use
"internal" as their transition datatype. The previous coding thought this
took *no* extra space, since "internal" is pass-by-value; but actually these
aggregates typically consume a great deal of space. Per bug #5608 from
Itagaki Takahiro, and fix suggestion from Hitoshi Harada.
Back-patch to 8.4, where array_agg was introduced.
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels. This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.
ExecReScan's second parameter is now unused, so it's removed.
sub-select contains a join alias reference that expands into an expression
containing another sub-select. Per yesterday's report from Merlin Moncure
and subsequent off-list investigation.
Back-patch to 7.4. Older versions didn't attempt to flatten sub-selects in
ways that would trigger this problem.
If such a Var appeared within a nested sub-select, we failed to translate it
correctly during pullup of the view, because the recursive call to
replace_rte_variables_mutator was looking for the wrong sublevels_up value.
Bug was introduced during the addition of the PlaceHolderVar mechanism.
Per bug #5514 from Marcos Castedo.
If the original IN operator is cross-type, for example int8 = int4,
we need to use int4 < int4 to sort the inner data and int4 = int4
to unique-ify it. We got the first part of that right, but tried to
use the original IN operator for the equality checks. Per bug #5472
from Vlad Romascanu.
Backpatch to 8.4, where the bug was introduced by the patch that unified
SortClause and GroupClause. I was able to take out a whole lot of on-the-fly
calls of get_equality_op_for_ordering_op(), but failed to realize that
I needed to put one back in right here :-(
MIN or MAX, we must take care to insert the added qual in a legal place among
the existing indexquals, if any. The btree index AM requires the quals to
appear in index-column order. We didn't have to worry about this before
because "target IS NOT NULL" was just treated as a plain scan filter condition;
but as of 9.0 it can be an index qual and then it has to follow the rule.
Per report from Ian Barwick.
The logic for determining whether to materialize has been significantly
overhauled for 9.0. In case there should be any doubt about whether
materialization is a win in any particular case, this should provide a
convenient way of seeing what happens without it; but even with enable_material
turned off, we still materialize in cases where it is required for
correctness.
Thanks to Tom Lane for the review.
constraint exclusion on an inheritance set that is the target of an UPDATE
or DELETE query. Per gripe from Marc Cousin. Back-patch to 8.4 where
the feature was introduced.
catalog entries via SearchSysCache and related operations. Although, at the
time that these callbacks are called by elog.c, we have not officially aborted
the current transaction, it still seems rather risky to initiate any new
catalog fetches. In all these cases the needed information is readily
available in the caller and so it's just a matter of a bit of extra notation
to pass it to the callback.
Per crash report from Dennis Koegel. I've concluded that the real fix for
his problem is to clear the error context stack at entry to proc_exit, but
it still seems like a good idea to make the callbacks a bit less fragile
for other cases.
Backpatch to 8.4. We could go further back, but the patch doesn't apply
cleanly. In the absence of proof that this fixes something and isn't just
paranoia, I'm not going to expend the effort.
We had originally made the stronger assumption that NOT A refutes any B
if B implies A, but this fails in three-valued logic, because we need to
prove B is false not just that it's not true. However the logic does
go through if B is equal to A.
Recognizing this limited case is enough to handle examples that arise when
we have simplified "bool_var = true" or "bool_var = false" to just "bool_var"
or "NOT bool_var". If we had not done that simplification then the
btree-operator proof logic would have been able to prove that the expressions
were contradictory, but only for identical expressions being compared to the
constants; so handling identical A and B covers all the same cases.
The motivation for doing this is to avoid unexpected asymmetrical behavior
when a partitioned table uses a boolean partitioning column, as in today's
gripe from Dominik Sander.
Back-patch to 8.2, which is as far back as predicate_refuted_by attempts to
do anything at all with NOTs.
tuple, instead of the former cpu_tuple_cost. It is sane to charge less than
cpu_tuple_cost because Materialize never does any qual-checking or projection,
so it's got less overhead than most plan node types. In particular, we want
to have the same charge here as is charged for readout in cost_sort. That
avoids the problem recently exhibited by Teodor wherein the planner prefers
a useless sort over a materialize step in a context where a lot of rescanning
will happen. The rescan costs should be just about the same for both node
types, so make their estimates the same.
Not back-patching because all of the current logic for rescan cost estimates
is new in 9.0. The old handling of rescans is sufficiently not-sane that
changing this in that structure is a bit pointless, and might indeed cause
regressions.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
This patch allows the frame to start from CURRENT ROW (in either RANGE or
ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
start and end points. (RANGE value PRECEDING/FOLLOWING isn't there yet ---
the grammar works, but that's all.)
Hitoshi Harada, reviewed by Pavel Stehule
The previous coding missed a bet by sometimes picking the "sorted" path
from query_planner even though hashing would be preferable. To fix, we have
to be willing to make the choice sooner. This contorts things a little bit,
but I thought of a factorization that makes it not too awful.
When a column is renamed, we recursively rename the same column in
all descendent tables. But if one of those tables also inherits that
column from a table outside the inheritance hierarchy rooted at the
named table, we must throw an error. The previous coding correctly
prohibited the rename when the parent had inherited the column from
elsewhere, but overlooked the case where the parent was OK but a child
table also inherited the same column from a second, unrelated parent.
For now, not backpatched due to lack of complaints from the field.
KaiGai Kohei, with further changes by me.
Reviewed by Bernd Helme and Tom Lane.
when the planner splits apart a ROW(...) IS NULL test, the argisrow values
of the component tests have to be determined from the component field types,
not copied from the original NullTest (in which argisrow is surely true).
EXISTS that contains a WITH clause. This would usually lead to a
"could not find CTE" error later in planning, because the WITH wouldn't
get processed at all. Noted while playing with an example from Ken Marshall.
parse analysis phase, rather than at execution time. This makes parameter
handling work the same as it does in ordinary plannable queries, and in
particular fixes the incompatibility that Pavel pointed out with plpgsql's
new handling of variable references. plancache.c gets a little bit
grottier, but the alternatives seem worse.
peculiar variant of UNION ALL, and so wouldn't likely get written directly
as-is, it's possible for it to arise as a result of simplification of
less-obviously-silly queries. In particular, now that we can do flattening
of subqueries that have constant outputs and are underneath an outer join,
it's possible for the case to result from simplification of queries of the
type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality
regression for this type of query.
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.
Thanks to Tom Lane for design help and Alvaro Herrera for the review.
8.2beta but never carried out. This avoids repetitive tests of whether the
argument is of scalar or composite type. Also, be a bit more paranoid about
composite arguments in some places where we previously weren't checking.
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL". Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.
autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
probably got there via blind copy-and-paste from one of the legitimate
callers, so rearrange and comment that code a bit to make it clearer that
this isn't a necessary prerequisite to hash_create. Per observation
from Robert Haas.
non-kluge method for controlling the order in which values are fed to an
aggregate function. At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.
Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally. Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.
Andrew Gierth, reviewed by Hitoshi Harada
we have to cope with the possibility that the declared result rowtype contains
dropped columns. This fails in 8.4, as per bug #5240.
While at it, be more paranoid about inserting binary coercions when inlining.
The pre-8.4 code did not really need to worry about that because it could not
inline at all in any case where an added coercion could change the behavior
of the function's statement. However, when inlining a SRF we allow sorting,
grouping, and set-ops such as UNION. In these cases, modifying one of the
targetlist entries that the sort/group/setop depends on could conceivably
change the behavior of the function's statement --- so don't inline when
such a case applies.
by adding a requirement that build_join_rel add new join RelOptInfos to the
appropriate list immediately at creation. Per report from Robert Haas,
the list_concat_unique_ptr() calls that this change eliminates were taking
the lion's share of the runtime in larger join problems. This doesn't do
anything to fix the fundamental combinatorial explosion in large join
problems, but it should push out the threshold of pain a bit further.
Note: because this changes the order in which joinrel lists are built,
it might result in changes in selected plans in cases where different
alternatives have exactly the same costs. There is one example in the
regression tests.
non-Var sort/group expressions using ressortgroupref labels instead of
depending entirely on equal()-ity of the upper node's tlist expressions
to the lower node's. This avoids emitting the wrong outputs in cases
where there are textually identical volatile sort/group expressions,
as for example
select distinct random(),random() from generate_series(1,10);
Per report from Andrew Gierth.
Backpatch to 8.4. Arguably this is wrong all the way back, but the only known
case where there's an observable problem is when using hash aggregation to
implement DISTINCT, which is new as of 8.4. So for the moment I'll refrain
from backpatching further.
mergejoin to shield it from doing mark/restore and refetches. Put an explicit
flag in MergePath so we can centralize the logic that knows about this,
and add costing logic that considers using Materialize even when it's not
forced by the previously-existing considerations. This is in response to
a discussion back in August that suggested that materializing an inner
indexscan can be helpful when the refetch percentage is high enough.
underneath the Limit node, not atop it. This fixes the old problem that such
a query might unexpectedly return fewer rows than the LIMIT says, due to
LockRows discarding updated rows.
There is a related problem that LockRows might destroy the sort ordering
produced by earlier steps; but fixing that by pushing LockRows below Sort
would create serious performance problems that are unjustified in many
real-world applications, as well as potential deadlock problems from locking
many more rows than expected. Instead, keep the present semantics of applying
FOR UPDATE after ORDER BY within a single query level; but allow the user to
specify the other way by writing FOR UPDATE in a sub-select. To make that
work, track whether FOR UPDATE appeared explicitly in sub-selects or got
pushed down from the parent, and don't flatten a sub-select that contained an
explicit FOR UPDATE.
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
are named in the UPDATE's SET list.
Note: the schema of pg_trigger has not actually changed; we've just started
to use a column that was there all along. catversion bumped anyway so that
this commit is included in the history of potentially interesting changes
to system catalog contents.
Itagaki Takahiro
execMain.c and into a new plan node type LockRows. Like the recent change
to put table updating into a ModifyTable plan node, this increases planning
flexibility by allowing the operations to occur below the top level of the
plan tree. It's necessary in any case to restore the previous behavior of
having FOR UPDATE locking occur before ModifyTable does.
This partially refactors EvalPlanQual to allow multiple rows-under-test
to be inserted into the EPQ machinery before starting an EPQ test query.
That isn't sufficient to fix EPQ's general bogosity in the face of plans
that return multiple rows per test row, though. Since this patch is
mostly about getting some plan node infrastructure in place and not about
fixing ten-year-old bugs, I will leave EPQ improvements for another day.
Another behavioral change that we could now think about is doing FOR UPDATE
before LIMIT, but that too seems like it should be treated as a followon
patch.
They are now handled by a new plan node type called ModifyTable, which is
placed at the top of the plan tree. In itself this change doesn't do much,
except perhaps make the handling of RETURNING lists and inherited UPDATEs a
tad less klugy. But it is necessary preparation for the intended extension of
allowing RETURNING queries inside WITH.
Marko Tiikkaja
The original coding correctly noted that these aren't just redundancies
(they're effectively X IS NOT NULL, assuming = is strict). However, they
got treated that way if X happened to be in a single-member EquivalenceClass
already, which could happen if there was an ORDER BY X clause, for instance.
The simplest and most reliable solution seems to be to not try to process
such clauses through the EquivalenceClass machinery; just throw them back
for traditional processing. The amount of work that'd be needed to be
smarter than that seems out of proportion to the benefit.
Per bug #5084 from Bernt Marius Johnsen, and analysis by Andrew Gierth.
tests into a small common subroutine, and eliminate an unnecessary difference
in the order in which conditions are tested. Per a comment from Robert Haas.
is unique and is not referenced above the join. In this case the inner
side doesn't affect the query result and can be thrown away entirely.
Although perhaps nobody would ever write such a thing by hand, it's
a reasonably common case in machine-generated SQL.
The current implementation only recognizes the case where the inner side
is a simple relation with a unique index matching the query conditions.
This is enough for the use-cases that have been shown so far, but we
might want to try to handle other cases later.
Robert Haas, somewhat rewritten by Tom
an explicit model of rescan costs being different from first-time costs.
The costing of Material nodes in particular now has some visible relationship
to the actual runtime behavior, where before it was essentially fantasy.
This also fixes up a couple of places where different materialized plan types
were treated differently for no very good reason (probably just oversights).
A couple of the regression tests are affected, because the planner now chooses
to put the other relation on the inside of a nestloop-with-materialize.
So far as I can see both changes are sane, and the planner is now more
consistently following the expectation that it should prefer to materialize
the smaller of two relations.
Per a recent discussion with Robert Haas.
In this case we generate two PathKey references to the expression (one for
DISTINCT and one for ORDER BY) and they really need to refer to the same
EquivalenceClass. However get_eclass_for_sort_expr was being overly paranoid
and creating two different EC's. Correct behavior is to use the SortGroupRef
index to decide whether two references to volatile expressions that are
equal() (ie textually equivalent) should be considered the same.
Backpatch to 8.4. Possibly this should be changed in 8.3 as well, but
I'll refrain in the absence of evidence of a visible failure in that branch.
Per bug #5049.
that's generated for a whole-row Var referencing the subquery, when the
subquery is in the nullable side of an outer join. The previous coding
instead put PlaceHolderVars around the elements of the RowExpr. The effect
was that when the outer join made the subquery outputs go to null, the
whole-row Var produced ROW(NULL,NULL,...) rather than just NULL. There
are arguments afoot about whether those things ought to be semantically
indistinguishable, but for the moment they are not entirely so, and the
planner needs to take care that its machinations preserve the difference.
Per bug #5025.
Making this feasible required refactoring ResolveNew() to allow more caller
control over what is substituted for a Var. I chose to make ResolveNew()
a wrapper around a new general-purpose function replace_rte_variables().
I also fixed the ancient bogosity that ResolveNew might fail to set
a query's hasSubLinks field after inserting a SubLink in it. Although
all current callers make sure that happens anyway, we've had bugs of that
sort before, and it seemed like a good time to install a proper solution.
Back-patch to 8.4. The problem can be demonstrated clear back to 8.0,
but the fix would be too invasive in earlier branches; not to mention
that people may be depending on the subtly-incorrect behavior. The
8.4 series is new enough that fixing this probably won't cause complaints,
but it might in older branches. Also, 8.4 shows the incorrect behavior
in more cases than older branches do, because it is able to flatten
subqueries in more cases.
I mistakenly removed it last month, thinking it was no longer needed ---
but it is still needed for dealing with joininfo lists. Fortunately this
bit of brain fade hadn't made it into any released versions yet.
Both hex format and the traditional "escape" format are automatically
handled on input. The output format is selected by the new GUC variable
bytea_output.
As committed, bytea_output defaults to HEX, which is an *incompatible
change*. We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.
Peter Eisentraut
for the case that the semijoin was implemented within either input by
unique-ifying its RHS before we test to see if it appears to match the current
join situation. The previous coding would select semijoin logic in situations
where we'd already unique-ified the RHS and joined it to some unrelated
relation(s), and then came to join it to the semijoin's LHS. That still gave
the right answer as far as the semijoin itself was concerned, but would lead
to incorrectly examining only an arbitrary one of the matchable rows from the
unrelated relation(s). The cause of this thinko was incorrect unification of
the pre-8.4 logic for IN joins and OUTER joins --- the comparable case for
outer joins can be handled after making the match test, but that's because
there is nothing like the unique-ification escape hatch for outer joins.
Per bug #4934 from Benjamin Reed.
reorder a semijoin into or out of the righthand side of another semijoin,
but actually it doesn't work to reorder it into or out of the righthand
side of a left or antijoin, either. Per bug #4906 from Mathieu Fenniak.
This was sloppy thinking on my part. This identity does work:
( A left join B on (Pab) ) semijoin C on (Pac)
==
( A semijoin C on (Pac) ) left join B on (Pab)
but I failed to see that that doesn't mean this does:
( A left join B on (Pab) ) semijoin C on (Pbc)
!=
A left join ( B semijoin C on (Pbc) ) on (Pab)
foo <> false, along with its previous duties of simplifying foo = true
and foo = false. (All of these are equivalent to just foo or NOT foo
as the case may be.) It's not clear how often this is really useful;
but it costs almost nothing to do, and it seems some people think we
should be smart about such cases. Per recent bug report.
sequence, even when the input "tour" doesn't lead directly to such a sequence.
The stack logic that was added in 2004 only supported cases where relations
that had to be joined to each other (due to join order restrictions) were
adjacent in the tour. However, relying on a random search to figure that out
is tremendously inefficient in large join problems, and could even fail
completely (leading to "failed to make a valid plan" errors) if
random_init_pool ran out of patience. It seems better to make the
tour-to-plan transformation a little bit fuzzier so that every tour can form
a legal plan, even though this means that apparently different tours will
sometimes yield the same plan.
In the same vein, get rid of the logic that knew that tours (a,b,c,d,...)
are the same as tours (b,a,c,d,...), and therefore insisted the latter
are invalid. The chance of generating two tours that differ only in
this way isn't that high, and throwing out 50% of possible tours to
avoid such duplication seems more likely to waste valuable genetic-
refinement generations than to do anything useful.
This leaves us with no cases in which geqo_eval will deem a tour invalid,
so get rid of assorted kluges that tried to deal with such cases, in
particular the undocumented assumption that DBL_MAX is an impossible
plan cost.
This is all per testing of Robert Haas' lets-remove-the-collapse-limits
patch. That idea has crashed and burned, at least for now, but we still
got something useful out of it.
It's possible we should back-patch this change, since the "failed to make a
valid plan" error can happen in existing releases; but I'd rather not until
it has gotten more testing.
by unique-ifying the RHS and then inner-joining to some other relation,
that is not grounds for violating the RHS of some other outer join.
Noticed while regression-testing new GEQO code, which will blindly follow
any path that join_is_legal says is legal, and then complain later if that
leads to a dead end.
I'm not certain that this can result in any visible failure in 8.4: the
mistake may always be masked by the fact that subsequent attempts to join
the rest of the RHS of the other join will fail. But I'm not certain it
can't, either, and it's definitely not operating as intended. So back-patch.
The added regression test depends on the new no-failures-allowed logic
that I'm about to commit in GEQO, so no point back-patching that.
that the sanity checking I added to create_mergejoin_plan() in 8.3 was a
few bricks shy of a load: the mergeclauses could reference pathkeys in a
noncanonical order such as x,y,x, not only cases like x,x,y which is all
that the code had allowed for. The odd cases only turn up when using
redundant clauses in an outer join condition, which is why no one had
noticed before.
random number seed each time. This is how it used to work years ago, but
we got rid of the seed reset because it was resetting the main random()
sequence and thus having undesirable effects on the rest of the system.
To fix, establish a private random number state for each execution of
geqo(), and initialize the state using the new GUC variable geqo_seed.
People who want to experiment with different random searches can do so
by changing geqo_seed, but you'll always get the same plan for the same
value of geqo_seed (if holding all other planner inputs constant, of course).
The new state is kept in PlannerInfo by adding a "void *" field reserved
for use by join_search hooks. Most of the rather bulky code changes in
this commit are just arranging to pass PlannerInfo around to all the GEQO
functions (many of which formerly didn't receive it).
Andres Freund, with some editorialization by Tom
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright. You still
(probably) need extern "C" { }; around the inclusion of backend headers.
based on a patch by Kurt Harriman <harriman@acm.org>
Also add a script cpluspluscheck to check for C++ compatibility in the
future. As of right now, this passes without error for me.
RelOptInfo targetlist. It used to be that the only possibility other than
a Var was a RowExpr representing a whole-row child Var, but as of 8.4's
expanded ability to flatten appendrel members, we can get arbitrary expressions
in there. Use the expression's type info and get_typavgwidth() to produce
an at-least-marginally-sane result. Note that get_typavgwidth()'s fallback
estimate (32 bytes) is the same as what was here before, so there will be
no behavioral change for RowExprs. Noted while looking at recent gripe
about constant quals pushed down to FunctionScan appendrel members ...
not only were we failing to recognize the constant qual, we were getting
the width estimate wrong :-(
substituting a child rel's output expressions into the appendrel's restriction
clauses yields a pseudoconstant restriction. We might be able to skip scanning
that child rel entirely (if we get constant FALSE), or generate a one-time
filter. 8.3 more or less accidentally generated plans that weren't completely
stupid in these cases, but that was only because an extra recursive level of
subquery_planner() always occurred and allowed const-simplification to happen.
8.4's ability to pull up appendrel members with non-Var outputs exposes the
fact that we need to work harder here. Per gripe from Sergey Burladyan.
the "cteParam" as a proxy for the possibility that the underlying CTE plan
depends on outer-level variables or Params, but that doesn't work very well
because it sometimes causes calling subqueries to be treated as SubPlans when
they could be InitPlans. This is inefficient and also causes the outright
failure exhibited in bug #4902. Instead, leave the cteParam out of it and
copy the underlying CTE plan's extParams directly. Per bug #4902 from
Marko Tiikkaja.
ability to lock relations as they scan pg_inherits, and to ignore any
relations that have disappeared by the time we get lock on them. This
makes uses of these functions safe against concurrent DROP operations
on child tables: we will effectively ignore any just-dropped child,
rather than possibly throwing an error as in recent bug report from
Thomas Johansson (and similar past complaints). The behavior should
not change otherwise, since the code was acquiring those same locks
anyway, just a little bit later.
An exception is LockTableCommand(), which is still behaving unsafely;
but that seems to require some more discussion before we change it.
find_inheritance_children() and find_all_inheritors(). I got annoyed that
these are buried inside the planner but mostly used elsewhere. So, create
a new file catalog/pg_inherits.c and put them there, along with a couple
of other functions that search pg_inherits.
The code that modifies pg_inherits is (still) in tablecmds.c --- it's
kind of entangled with unrelated code that modifies pg_depend and other
stuff, so pulling it out seemed like a bigger change than I wanted to make
right now. But this file provides a natural home for it if anyone ever
gets around to that.
This commit just moves code around; it doesn't change anything, except
I succumbed to the temptation to make a couple of trivial optimizations
in typeInheritsFrom().
of AND/OR clause branches that predtest.c would attempt to deal with. As
noted in bug #4721, that change disabled proof attempts for sizes of problems
that people are actually expecting it to work for. The original complaint
it was trying to solve was O(N^2) behavior for long IN-lists, so let's try
applying the limit to just ScalarArrayOpExprs rather than everything.
Another case of "foolish consistency" I fear.
Back-patch to 8.2, same as the previous patch was.
predicate_refuted_by: if either top-level input is a single-element list,
reduce it to its lone member before proceeding. This avoids
a useless level of AND-recursion within the recursive proof routines.
It's worth doing because, for example, if the clause is a 100-element
list and the predicate is a 1-element list then we'd otherwise strip
the predicate's list structure 100 times as we iterate through the clause.
It's only needed at top level because there won't be any trivial ANDs below
that --- this situation is an artifact of the decision to represent even
single-item conditions as Lists in the "implicit AND" format, and that format
is only used at the top level of any predicate or restriction condition.
joins a bit better, ie, understand the differing cost functions for matched
and unmatched outer tuples. There is more that could be done in cost_hashjoin
but this already helps a great deal. Per discussions with Robert Haas.
restrictions specified for semijoins in optimizer/README, to wit that
you can't reassociate outer joins into or out of the RHS of a semijoin.
Per report from Heikki.
can be pushed to the top of the join tree, we update both the relids and
qualscope variables to keep them in sync. This prevents a possible later
failure of an Assert clause, and affects nothing else since qualscope isn't
used later except for that Assert. At the moment the Assert shouldn't be
reachable when we've pushed the qual up; but this is cheap insurance, and
it's more sensible anyway in terms of the overall logic of the routine.
Per analysis of a bug report from Stefan Huehner.
I'm not back-patching this since it's just future-proofing; but if anyone
gets tempted to change check_outerjoin_delay again in the back branches,
this might be needed.
PlaceHolderVar nodes in join quals appearing in or below the lowest
outer join that could null the subquery being pulled up. This improves
the planner's ability to recognize constant join quals, and probably
helps with detection of common sort keys (equivalence classes) as well.
aggregate function. By definition, such a sub-SELECT cannot reference any
variables of query levels between itself and the aggregate's semantic level
(else the aggregate would've been assigned to that lower level instead).
So the correct, most efficient implementation is to treat the sub-SELECT as
being a sub-select of that outer query level, not the level the aggregate
syntactically appears in. Not doing so also confuses the heck out of our
parameter-passing logic, as illustrated in bug report from Daniel Grace.
Fortunately, we were already copying the whole Aggref expression up to the
outer query level, so all that's needed is to delay SS_process_sublinks
processing of the sub-SELECT until control returns to the outer level.
This has been broken since we introduced spec-compliant treatment of
outer aggregates in 7.4; so patch all the way back.
Stefan Kaltenbrunner. The most reasonable behavior (at least for the near
term) seems to be to ignore the PlaceHolderVar and examine its argument
instead. In support of this, change the API of pull_var_clause() to allow
callers to request recursion into PlaceHolderVars. Currently
estimate_num_groups() is the only customer for that behavior, but where
there's one there may be others.
constants through full joins, as in
select * from tenk1 a full join tenk1 b using (unique1)
where unique1 = 42;
which should generate a fairly cheap plan where we apply the constraint
unique1 = 42 in each relation scan. This had been broken by my patch of
2008-06-27, which is now reverted in favor of a more invasive but hopefully
less incorrect approach. That patch was meant to prevent incorrect extraction
of OR'd indexclauses from OR conditions above an outer join. To do that
correctly we need more information than the outerjoin_delay flag can provide,
so add a nullable_relids field to RestrictInfo that records exactly which
relations are nulled by outer joins that are underneath a particular qual
clause. A side benefit is that we can make the test in create_or_index_quals
more specific: it is now smart enough to extract an OR'd indexclause into the
outer side of an outer join, even though it must not do so in the inner side.
The old coding couldn't distinguish these cases so it could not do either.
are individually labeled, rather than just grouped under an "InitPlan"
or "SubPlan" heading. This in turn makes it possible for decompilation of
a subplan reference to usefully identify which subplan it's referencing.
I also made InitPlans identify which parameter symbol(s) they compute,
so that references to those parameters elsewhere in the plan tree can
be connected to the initplan that will be executed. Per a gripe from
Robert Haas about EXPLAIN output of a WITH query being inadequate,
plus some longstanding pet peeves of my own.
temp relations; this is no more expensive than before, now that we have
pg_class.relistemp. Insert tests into bufmgr.c to prevent attempting
to fetch pages from nonlocal temp relations. This provides a low-level
defense against bugs-of-omission allowing temp pages to be loaded into shared
buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop.
While at it, tweak a bunch of places to use new relcache tests (instead of
expensive probes into pg_namespace) to detect local or nonlocal temp tables.
"physical tlist" optimization on the outer relation (ie, force a projection
step to occur in its scan). This avoids storing useless column values when
the outer relation's tuples are written to temporary batch files.
Modified version of a patch by Michael Henderson and Ramon Lawrence.
distribution, by creating a special fast path for the (first few) most common
values of the outer relation. Tuples having hashvalues matching the MCVs
are effectively forced to be in the first batch, so that we never write
them out to the batch temp files.
Bryce Cutt and Ramon Lawrence, with some editorialization by me.
exact-match pattern (no wildcard) can be index-optimized in some cases where a
prefix-match pattern cannot; specifically, since the required index clause is
simple equality, it works for regular text/varchar indexes even when the
locale is not C. I'm not sure how often this case really comes up, but since
it requires hardly any additional work to handle it, we might as well get it
right. Motivated by a discussion on the JDBC list.
for consistency with the (relatively) recent addition of typmod to SubLink.
An example of why it's a good idea is to be seen in the recent "failed to
locate grouping columns" bug, which wouldn't have happened if a SubPlan
exposed the same typmod info as the SubLink it was derived from.
This could be back-patched, since it doesn't affect any on-disk data format,
but for the moment it doesn't seem necessary to do so.
by the planning process. This prevents the "failed to locate grouping columns"
error recently reported by Dickson Guedes. That happens because planning
replaces SubLinks by SubPlans in the subquery's targetlist, and exprTypmod()
is smarter about the former than the latter, causing the apparent type of
the subquery's output columns to change. This seems to be a deficiency we
should fix in exprTypmod(), but that will be a much more invasive patch
with possible side-effects elsewhere, so I'll do that only in HEAD.
Back-patch to 8.3. Arguably the lack of a copying step is broken/dangerous
all the way back, but in the absence of known problems I'll refrain from
making the older branches pay the extra cost. (The reason this particular
symptom didn't appear before is that exprTypmod() wasn't smart about SubLinks
either, until 8.3.)
amgettuple or only implement amgetbitmap, instead of the former assumption
that every AM supports both APIs. Extracted with minor editorialization
from Teodor's fast-GIN-insert patch; whatever becomes of that, this seems
like a simple and reasonable generalization of the index AM interface spec.
attribute numbering. Also, a parent whole-row reference should not require
select privilege on child columns that aren't inherited from the parent.
Problem diagnosed by KaiGai Kohei, though this isn't exactly his patch.
input lists before we grovel through the lists. This doesn't save much,
but testing shows that the case of both inputs NIL is common enough that
it saves something. And this is used enough to be a hotspot.
the ON clause of an outer join. Doing so is semantically correct but results
in de-optimizing queries that were structured to take advantage of the sublink
style of execution, as seen in recent complaint from Kevin Grittner. Since
the user can get the other behavior by reorganizing his query, having the
flattening happen automatically is just a convenience, and that doesn't
justify breaking existing applications. Eventually it would be nice to
re-enable this, but that seems to require a significantly different approach
to outer joins in the executor.
to be syntactically part of a semijoin clause. For example given
WHERE EXISTS(SELECT ... WHERE upper.var = lower.var AND some-condition)
where some-condition is just a restriction on the lower relation, we can
use unique-ification on lower.var after having applied some-condition within
the scan on lower.
making pull_up_sublinks() construct a full-blown JoinExpr tree representation
of IN/EXISTS SubLinks that it is able to convert to semi or anti joins.
This makes pull_up_sublinks() a shade more complex, but the gain in semantic
clarity is worth it. I still have more to do in this area to address the
previously-discussed problems, but this commit in itself fixes at least one
bug in HEAD, as shown by added regression test case.
IS NULL condition is rendered redundant by detection of an antijoin.
If we know that a join is an antijoin, then *any* Var coming out of its
righthand side must be NULL, not only the joining column(s). Also,
it's still gonna be null after being passed up through higher joins,
whether they're outer joins or not. I was misled by a faulty analogy
to reduce_outer_joins() in the original coding. But consider
select * from a left join b on a.x = b.y where b.y is null and b.z is null;
The first IS NULL condition justifies deciding that the join is an antijoin
(if the = is strict) and then the second one is just plain redundant.
unique for a particular query, if the index predicate is satisfied. This
requires a bit of reordering of operations so that we check the predicates
before doing any selectivity estimates, but shouldn't really cause any
noticeable slowdown. Per a comment from Michal Politowski.
keys when considering a semi or anti join. This requires estimating the
selectivity of the merge qual as though it were a regular inner join condition.
To allow caching both that and the real outer-join-aware selectivity, split
RestrictInfo.this_selec into two fields.
This fixes one of the problems reported by Kevin Grittner.
the cheapest-total inner path as a new candidate while truncating the
sort key list, if it already matched the full sort key list. This is
too much of a corner case to be worth back-patching, since it's unusual
for the cheapest total path to be sorted, and anyway no real harm is
done (except in JOIN_SEMI/ANTI cases where cost_mergejoin is a bit
broken at the moment). But it wasn't behaving as intended, so fix it.
Noted while examining a test case from Kevin Grittner. This error doesn't
explain his issue, but it does explain why "set enable_seqscan = off"
seemed to reproduce it for me.
that are set up for execution with ExecPrepareExpr rather than going through
the full planner process. By introducing an explicit notion of "expression
planning", this patch also lays a bit of groundwork for maybe someday
allowing sub-selects in standalone expressions.
the default. This setting enables constraint exclusion checks only for
appendrel members (ie, inheritance children and UNION ALL arms), which are
the cases in which constraint exclusion is most likely to be useful. Avoiding
the overhead for simple queries that are unlikely to benefit should bring
the cost down to the point where this is a reasonable default setting.
Per today's discussion.
default expressions to a function call, eval_const_expressions must recurse on
those expressions. Else they don't get simplified, and in particular we fail
to insert additional default arguments if any functions needing defaults are
in there. Per report from Rushabh Lathia.
patch. This includes the ability to force the frame to cover the whole
partition, and the ability to make the frame end exactly on the current row
rather than its last ORDER BY peer. Supporting any more of the full SQL
frame-clause syntax will require nontrivial hacking on the window aggregate
code, so it'll have to wait for 8.5 or beyond.
form a join and that case doesn't have anything to join to. (We could
probably make it work if we didn't pull up the subquery, but it seems to
me that the case isn't worth extra code.) Per report from Greg Stark.
outer join clauses. Given, say,
... from a left join b on a.a1 = b.b1 where a.a1 = 42;
we'll deduce a clause b.b1 = 42 and then mark the original join clause
redundant (we can't remove it completely for reasons I don't feel like
squeezing into this log entry). However the original implementation of
that wasn't bulletproof, because clause_selectivity() wouldn't honor
this_selec if given nonzero varRelid --- which in practice meant that
it worked as desired *except* when considering index scan quals. Which
resulted in bogus underestimation of the size of the indexscan result for
an inner indexscan in an outer join, and consequently a possibly bad
choice of indexscan vs. bitmap scan. Fix by introducing an explicit test
into clause_selectivity(). Also, to make sure we don't trigger that test
in corner cases, change the convention to be that this_selec > 1, not
this_selec = 1, means it's been marked redundant. Per trouble report from
Scara Maccai.
Back-patch to 8.2, where the problem was introduced.
RHS that can't be unique-ified --- join_is_legal has to check that before
deciding to build a join, else we'll have an unimplementable joinrel.
Per report from Greg Stark.
though it is an inner rather than outer join type. This essentially means
that we don't bother to separate "pushed down" qual conditions from actual
join quals at a semijoin plan node; which is okay because the restrictions of
SQL syntax make it impossible to have a pushed-down qual that references the
inner side of a semijoin. This allows noticeably better optimization of
IN/EXISTS cases than we had before, since the equivalence-class machinery can
now use those quals. Also fix a couple of other mistakes that had essentially
disabled the ability to unique-ify the inner relation and then join it to just
a subset of the left-hand relations. An example case using the regression
database is
select * from tenk1 a, tenk1 b
where (a.unique1,b.unique2) in (select unique1,unique2 from tenk1 c);
which is planned reasonably well by 8.3 and earlier but had been forcing a
cartesian join of a/b in CVS HEAD.
as LIKE. I oversimplified this code when removing support for plan-time
determination of index operator lossiness back in April --- I had thought
create_bitmap_subplan could stop returning two separate lists of qual
conditions, but it still must so that we can treat special operators
correctly in create_bitmap_scan_plan. Per report from Rushabh Lathia.
return the tableoid as well as the ctid for any FOR UPDATE targets that
have child tables. All child tables are listed in the ExecRowMark list,
but the executor just skips the ones that didn't produce the current row.
Curiously, this longstanding restriction doesn't seem to have been documented
anywhere; so no doc changes.
operator. The result depends only on the two input operators and the proof
direction (imply or refute), so it's easy to cache. This provides a very
large savings in cases such as Sergey Konoplev's long NOT-IN-list example,
where predtest spends all its time repeatedly figuring out that the same pair
of operators cannot be used to prove anything. (But of course the O(N^2)
behavior still catches up with you eventually.) I'm not convinced it buys
a whole lot when constraint_exclusion isn't turned on, but it's not a lot
of added code so we might as well cache all the time.
AND, OR, or equivalent clauses: if there are too many (more than 100) just
exit without proving anything. This ensures that we don't spend O(N^2) time
trying (and most likely failing) to prove anything about very long IN lists
and similar cases.
Also, install a couple of CHECK_FOR_INTERRUPTS calls to ensure that a long
proof attempt can be interrupted.
Per gripe from Sergey Konoplev.
Back-patch the whole patch to 8.2 and just the CHECK_FOR_INTERRUPTS addition
to 8.1. (The rest of the patch doesn't apply cleanly, and since 8.1 doesn't
show the complained-of behavior anyway, it doesn't seem necessary to work
hard on it.)
translated_vars list get updated when pulling up an appendrel member. It's
not clear that this really matters at present, since relatively little gets
done with the outputs of an appendrel child relation; but it probably will
come back to bite us sometime if we leave them with the wrong values.
we extended the appendrel mechanism to support UNION ALL optimization. The
reason nobody noticed was that we are not actually using attr_needed data for
appendrel children; hence it seems more reasonable to rip it out than fix it.
Back-patch to 8.2 because an Assert failure is possible in corner cases.
Per examination of an example from Jim Nasby.
In HEAD, also get rid of AppendRelInfo.col_mappings, which is quite inadequate
to represent UNION ALL situations; depend entirely on translated_vars instead.
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values). Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.
Kris Jurka
until vars are distributed to rels during query_planner() startup. We don't
really need it before that, and not building it early has some advantages.
First, we don't need to put it through the various preprocessing steps, which
saves some cycles and eliminates the need for a number of routines to support
PlaceHolderInfo nodes at all. Second, this means one less unused plan for any
sub-SELECT appearing in a placeholder's expression, since we don't build
placeholder_list until after sublink expansion is complete.
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).
The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join. Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join. When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results. This fixes a planner
limitation that's existed since 7.1.
In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
set_rel_width(). The code had been catering for the possibility of different
varnos in the relation targetlist, but this is impossible for a base relation
(and if it were possible, putting all the widths in the same RelOptInfo would
be wrong anyway).
implementation uses an in-memory hash table, so it will poop out for very
large recursive results ... but the performance characteristics of a
sort-based implementation would be pretty unpleasant too.
the column alias names of the RTE referenced by the Var to the RowExpr.
This is needed to allow ruleutils.c to correctly deparse FieldSelect nodes
referencing such a construct. Per my recent bug report.
Adding a field to RowExpr forces initdb (because of stored rules changes)
so this solution is not back-patchable; which is unfortunate because 8.2
and 8.3 have this issue. But it only affects EXPLAIN for some pretty odd
corner cases, so we can probably live without a solution for the back
branches.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.
There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly. But let's land
the patch now so we can get on with other development.
Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
btree. We can't easily tell whether clauses generated from the equivalence
class could be used with such an index, so just assume that they might be.
This bit of over-optimization prevented use of non-btree indexes for nestloop
inner indexscans, in any case where the join uses an equality operator that
is also a btree operator --- which in particular is typically true for hash
indexes. Noted while trying to test the current hash index patch.
when user-defined functions used in a plan are modified. Also invalidate
plans when schemas, operators, or operator classes are modified; but for these
cases we just invalidate everything rather than tracking exact dependencies,
since these types of objects seldom change in a production database.
Tom Lane; loosely based on a patch by Martin Pihlak.
inserting a materialize node above an inner-side sort node, when the sort is
expected to spill to disk. (The materialize protects the sort from having
to support mark/restore, allowing it to do its final merge pass on-the-fly.)
We neglected to teach cost_mergejoin about that hack, so it was failing to
include the materialize's costs in the estimated cost of the mergejoin.
The materialize's costs are generally going to be pretty negligible in
comparison to the sort's, so this is only a small error and probably not
worth back-patching; but it's still wrong.
In the similar case where a materialize is inserted to protect an inner-side
node that can't do mark/restore at all, it's still true that the materialize
should not spill to disk, and so we should cost it cheaply rather than
expensively.
Noted while thinking about a question from Tom Raney.
most node types used in expression trees (both before and after parse
analysis). This allows us to place an error cursor in many situations
where we formerly could not, because the information wasn't available
beyond the very first level of parse analysis. There's a fair amount
of work still to be done to persuade individual ereport() calls to actually
include an error location, but this gets the initdb-forcing part of the
work out of the way; and the situation is already markedly better than
before for complaints about unimplementable implicit casts, such as
CASE and UNION constructs with incompatible alternative data types.
Per my proposal of a few days ago.
when its input is constant and the element coercion function is immutable
(or nonexistent, ie, binary-coercible case). This is an oversight in the
8.3 implementation of ArrayCoerceExpr, and its result is that certain cases
involving IN or NOT IN with constants don't get optimized as they should be.
Per experimentation with an example from Ow Mun Heng.
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend. There's probably more that should be done along this line,
but this is a start anyway.
subqueries into the same thing you'd have gotten from IN (except always with
unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
I believe this fixes the last case within CVS HEAD in which an EXISTS could
give worse performance than an equivalent IN subquery.
The tricky part of this is that if the upper query probes the EXISTS for only
a few rows, the hashing implementation can actually be worse than the default,
and therefore we need to make a cost-based decision about which way to use.
But at the time when the planner generates plans for subqueries, it doesn't
really know how many times the subquery will be executed. The least invasive
solution seems to be to generate both plans and postpone the choice until
execution. Therefore, in a query that has been optimized this way, EXPLAIN
will show two subplans for the EXISTS, of which only one will actually get
executed.
There is a lot more that could be done based on this infrastructure: in
particular it's interesting to consider switching to the hash plan if we start
out using the non-hashed plan but find a lot more upper rows going by than we
expected. I have therefore left some minor inefficiencies in place, such as
initializing both subplans even though we will currently only use one.
to be used for SubLinks that are underneath a top-level OR clause. Just as at
the very top level of WHERE, it's not necessary to be accurate about whether
the sublink returns FALSE or NULL, because either result has the same impact
on whether the WHERE will succeed.
eval_const_expressions will generally throw away anything that's ANDed with
constant FALSE, what we're left with given an example like
select * from tenk1 a where (unique1,0) in (select unique2,1 from tenk1 b);
is a cartesian product computation, which is really not acceptable.
This is a regression in CVS HEAD compared to previous releases, which were
able to notice the impossible join condition in this case --- though not in
some related cases that are also improved by this patch, such as
select * from tenk1 a left join tenk1 b on (a.unique1=b.unique2 and 0=1);
Fix by skipping evaluation of the appropriate side of the outer join in
cases where it's demonstrably unnecessary.
that we're considering pulling up. I hadn't wanted to think through whether
that could work during the first pass at this stuff. However, on closer
inspection it seems to be safe enough.
level of a JOIN/ON clause, not only at top level of WHERE. (However, we
can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
only to the nullable side of the outer join, so that it can effectively
be pushed down into the nullable side.) Per request from Kevin Grittner.
In passing, fix a bug in the initial implementation of EXISTS pullup:
it would Assert if the EXIST's WHERE clause used a join alias variable.
Since we haven't yet flattened join aliases when this transformation
happens, it's necessary to include join relids in the computed set of
RHS relids.
and anti joins. To do this, pass the SpecialJoinInfo struct for the current
join as an additional optional argument to operator join selectivity
estimation functions. This allows the estimator to tell not only what kind
of join is being formed, but which variable is on which side of the join;
a requirement long recognized but not dealt with till now. This also leaves
the door open for future improvements in the estimators, such as accounting
for the null-insertion effects of lower outer joins. I didn't do anything
about that in the current patch but the information is in principle deducible
from what's passed.
The patch also clarifies the definition of join selectivity for semi/anti
joins: it's the fraction of the left input that has (at least one) match
in the right input. This allows getting rid of some very fuzzy thinking
that I had committed in the original 7.4-era IN-optimization patch.
There's probably room to estimate this better than the present patch does,
but at least we know what to estimate.
Since I had to touch CREATE OPERATOR anyway to allow a variant signature
for join estimator functions, I took the opportunity to add a couple of
additional checks that were missing, per my recent message to -hackers:
* Check that estimator functions return float8;
* Require execute permission at the time of CREATE OPERATOR on the
operator's function as well as the estimator functions;
* Require ownership of any pre-existing operator that's modified by
the command.
I also moved the lookup of the functions out of OperatorCreate() and
into operatorcmds.c, since that seemed more consistent with most of
the other catalog object creation processes, eg CREATE TYPE.
parent, not only those with RangeTblRefs. We need them in ExecCheckRTPerms.
Report by Brendan O'Shea. Back-patch to 8.2, where pull_up_simple_union_all
was introduced.
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
hashtable entries for tuples that are found only in the second input: they
can never contribute to the output. Furthermore, this implies that the
planner should endeavor to put first the smaller (in number of groups) input
relation for an INTERSECT. Implement that, and upgrade prepunion's estimation
of the number of rows returned by setops so that there's some amount of sanity
in the estimate of which one is smaller.
This completes my project of improving usage of hashing for duplicate
elimination (aggregate functions with DISTINCT remain undone, but that's
for some other day).
As with the previous patches, this means we can INTERSECT/EXCEPT on datatypes
that can hash but not sort, and it means that INTERSECT/EXCEPT without ORDER
BY are no longer certain to produce sorted output.
but seem like a separate patch since most of the remaining work is on the
executor side.) I took the opportunity to push selection of the grouping
operators for set operations into the parser where it belongs. Otherwise this
is just a small exercise in making prepunion.c consider both alternatives.
As with the recent DISTINCT patch, this means we can UNION on datatypes that
can hash but not sort, and it means that UNION without ORDER BY is no longer
certain to produce sorted output.
sure that DISTINCT ON does what it's supposed to, ie, sort by the full ORDER
BY list before unique-ifying. The error seems masked in simple cases by the
fact that query_planner won't return query pathkeys that only partially match
the requested sort order, but I wouldn't want to bet that it couldn't be
exposed in some way or other.
as methods for implementing the DISTINCT step. This eliminates the former
performance gap between DISTINCT and GROUP BY, and also makes it possible
to do SELECT DISTINCT on datatypes that only support hashing not sorting.
SELECT DISTINCT ON is still always implemented by sorting; it would take
executor changes to support hashing that, and it's not clear it's worth
the trouble.
This is a release-note-worthy incompatibility from previous PG versions,
since SELECT DISTINCT can no longer be counted on to deliver sorted output
without explicitly saying ORDER BY. (Anyone who can't cope with that
can consider turning off enable_hashagg.)
Several regression test queries needed to have ORDER BY added to preserve
stable output order. I fixed the ones that manifested here, but there
might be some other cases that show up on other platforms.
sorting. The infrastructure for this was all in place already; it's only
necessary to fix the planner to not assume that sorting is always an available
option.
as per my recent proposal:
1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().
2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator. This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning. In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.
3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON. This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.
This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.
to represent DISTINCT or DISTINCT ON. This gets rid of a longstanding
annoyance that a view or rule using SELECT DISTINCT will be dumped out
with an overspecified ORDER BY list, and is one small step along the way
to decoupling DISTINCT and ORDER BY enough so that hash-based implementation
of DISTINCT will be possible. In passing, improve transformDistinctClause
so that it doesn't reject duplicate DISTINCT ON items, as was reported by
Steve Midgley a couple weeks ago.
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.
I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.
the current query level that aren't in fact output parameters of the current
initPlans. (This means, for example, output parameters of regular subplans.)
To make this work correctly for output parameters coming from sibling
initplans requires rejiggering the API of SS_finalize_plan just a bit:
we need the siblings to be visible to it, rather than hidden as
SS_make_initplan_from_plan had been doing. This is really part of my response
to bug #4290, but I concluded this part probably shouldn't be back-patched,
since all that it's doing is to make a debugging cross-check tighter.
bug #4290. The fundamental bug is that masking extParam by outer_params,
as finalize_plan had been doing, caused us to lose the information that
an initPlan depended on the output of a sibling initPlan. On reflection
the best thing to do seemed to be not to try to adjust outer_params for
this case but get rid of it entirely. The only thing it was really doing
for us was to filter out param IDs associated with SubPlan nodes, and that
can be done (with greater accuracy) while processing individual SubPlan
nodes in finalize_primnode. This approach was vindicated by the discovery
that the masking method was hiding a second bug: SS_finalize_plan failed to
remove extParam bits for initPlan output params that were referenced in the
main plan tree (it only got rid of those referenced by other initPlans).
It's not clear that this caused any real problems, given the limited use
of extParam by the executor, but it's certainly not what was intended.
I originally thought that there was also a problem with needing to include
indirect dependencies on external params in initPlans' param sets, but it
turns out that the executor handles this correctly so long as the depended-on
initPlan is earlier in the initPlans list than the one using its output.
That seems a bit of a fragile assumption, but it is true at the moment,
so I just documented it in some code comments rather than making what would
be rather invasive changes to remove the assumption.
Back-patch to 8.1. Previous versions don't have the case of initPlans
referring to other initPlans' outputs, so while the existing logic is still
questionable for them, there are not any known bugs to be fixed. So I'll
refrain from changing them for now.
of any lower outer join, even if it also references the non-nullable side and
so could not get pushed below the outer join anyway. We need this in case
the clause is an OR clause: if it doesn't get marked outerjoin_delayed,
create_or_index_quals() could pull an indexable restriction for the nullable
side out of it, leading to wrong results as demonstrated by today's bug
report from toruvinn. (See added regression test case for an example.)
In principle this has been wrong for quite a while. In practice I don't
think any branch before 8.3 can really show the failure, because
create_or_index_quals() will only pull out indexable conditions, and before
8.3 those were always strict. So though we might have improperly generated
null-extended rows in the outer join, they'd get discarded from the result
anyway. The gating factor that makes the failure visible is that 8.3
considers "col IS NULL" to be indexable. Hence I'm not going to risk
back-patching further than 8.3.
taking the maximum of any child rel's width, we should weight the widths
proportionally to the number of rows expected from each child. In hindsight
this is obviously correct because row width is really a proxy for the total
physical size of the relation. Per discussion with Scott Carey (bug #4264).
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
that it depends on for replan-forcing purposes. We need to consider plain OID
constants too, because eval_const_expressions folds a RelabelType atop a Const
to just a Const. This change could result in OID values that aren't really
for tables getting added to the dependency list, but the worst-case
consequence would be occasional useless replans. Per report from Gabriele
Messineo.
the associated datatype as their equality member. This means that these
opclasses can now support plain equality comparisons along with LIKE tests,
thus avoiding the need for an extra index in some applications. This
optimization was not possible when the pattern opclasses were first introduced,
because we didn't insist that text equality meant bitwise equality; but we
do now, so there is no semantic difference between regular and pattern
equality operators.
I removed the name_pattern_ops opclass altogether, since it's really useless:
name's regular comparisons are just strcmp() and are unlikely to become
something different. Instead teach indxpath.c that btree name_ops can be
used for LIKE whether or not the locale is C. This might lead to a useful
speedup in LIKE queries on the system catalogs in non-C locales.
The ~=~ and ~<>~ operators are gone altogether. (It would have been nice to
keep them for backward compatibility's sake, but since the pg_amop structure
doesn't allow multiple equality operators per opclass, there's no way.)
A not-immediately-obvious incompatibility is that the sort order within
bpchar_pattern_ops indexes changes --- it had been identical to plain
strcmp, but is now trailing-blank-insensitive. This will impact
in-place upgrades, if those ever happen.
Per discussions a couple months ago.
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
output is not of the same type that's needed for the IN comparison (ie,
where the parser inserted an implicit coercion above the subselect result).
We should record the coerced expression, not just a raw Var referencing
the subselect output, as the quantity that needs to be unique-ified if
we choose to implement the IN as Unique followed by a plain join.
As of 8.3 this error was causing crashes, as seen in bug #4113 from Javier
Hernandez, because the executor was being told to hash or sort the raw
subselect output column using operators appropriate to the coerced type.
In prior versions there was no crash because the executor chose the
hash or sort operators for itself based on the column type it saw.
However, that's still not really right, because what's unique for one data
type might not be unique for another. In corner cases we could get multiple
outputs of a row that should appear only once, as demonstrated by the
regression test case included in this commit.
However, this patch doesn't apply cleanly to 8.2 or before, and the code
involved has shifted enough over time that I'm hesitant to try to back-patch.
Given the lack of complaints from the field about such corner cases, I think
the bug may not be important enough to risk breaking other things with a
back-patch.
where Datum is 8 bytes wide. Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior. Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.
Zoltan Boszormenyi, plus configurability stuff by me.
we had several code paths where a physical tlist could be used for the input
to a Sort node, which is a dumb idea because any unneeded table columns will
increase the volume of data the sort has to push around.
(Unfortunately the easy-looking fix of calling disuse_physical_tlist during
make_sort_xxx doesn't work because in most cases we're already committed to
the current input tlist --- it's been marked with sort column numbers, or
we've built grouping column numbers using it, etc. The tlist has to be
selected properly at the calling level before we start constructing sort-col
information. This is easy enough to do, we were just failing to take the
point into consideration.)
Back-patch to 8.3. I believe the problem probably exists clear back to 7.4
when the physical tlist optimization was added, but I'm afraid to back-patch
further than 8.3 without a great deal more study than I want to put into it.
The code in this area has drifted a lot over time. The real-world importance
of these code paths is uncertain anyway --- I think in many cases we'd
probably prefer hash-based methods.
no particular need to do get_op_opfamily_properties() while building an
indexscan plan. Postpone that lookup until executor start. This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions. Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
instead of plan time. Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now). For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions. The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
eval_const_expressions needs to be passed the PlannerInfo ("root") structure,
because in some cases we want it to substitute values for Param nodes.
(So "constant" is not so constant as all that ...) This mistake partially
disabled optimization of unnamed extended-Query statements in 8.3: in
particular the LIKE-to-indexscan optimization would never be applied if the
LIKE pattern was passed as a parameter, and constraint exclusion depending
on a parameter value didn't work either.
useless for an ungrouped-aggregate query holds regardless of whether
optimize_minmax_aggregates succeeds. So we might as well apply the
optimization in any case.
I'll leave 8.3 as it was, since this version is a tad more invasive
than my earlier patch.
the query result must be exactly one row (since we don't do this when there's
any GROUP BY). Therefore any ORDER BY or DISTINCT attached to the query is
useless and can be dropped. Aside from saving useless cycles, this protects
us against problems with matching the hacked-up tlist entries to sort clauses,
as seen in a bug report from Taiki Yamaguchi. We might need to work harder
if we ever try to optimize grouped queries with this approach, but this
solution will do for now.