Commit Graph

101 Commits

Author SHA1 Message Date
Tom Lane f69b4b9495 Fix some planner issues with degenerate outer join clauses.
An outer join clause that didn't actually reference the RHS (perhaps only
after constant-folding) could confuse the join order enforcement logic,
leading to wrong query results.  Also, nested occurrences of such things
could trigger an Assertion that on reflection seems incorrect.

Per fuzz testing by Andreas Seltenreich.  The practical use of such cases
seems thin enough that it's not too surprising we've not heard field
reports about it.

This has been broken for a long time, so back-patch to all active branches.
2015-08-01 20:57:41 -04:00
Tom Lane a6492ff897 Fix an oversight in checking whether a join with LATERAL refs is legal.
In many cases, we can implement a semijoin as a plain innerjoin by first
passing the righthand-side relation through a unique-ification step.
However, one of the cases where this does NOT work is where the RHS has
a LATERAL reference to the LHS; that makes the RHS dependent on the LHS
so that unique-ification is meaningless.  joinpath.c understood this,
and so would not generate any join paths of this kind ... but join_is_legal
neglected to check for the case, so it would think that we could do it.
The upshot would be a "could not devise a query plan for the given query"
failure once we had failed to generate any join paths at all for the bogus
join pair.

Back-patch to 9.3 where LATERAL was added.
2015-07-31 19:26:33 -04:00
Tom Lane 95f4e59c32 Remove an unsafe Assert, and explain join_clause_is_movable_into() better.
join_clause_is_movable_into() is approximate, in the sense that it might
sometimes return "false" when actually it would be valid to push the given
join clause down to the specified level.  This is okay ... but there was
an Assert in get_joinrel_parampathinfo() that's only safe if the answers
are always exact.  Comment out the Assert, and add a bunch of commentary
to clarify what's going on.

Per fuzz testing by Andreas Seltenreich.  The added regression test is
a pretty silly query, but it's based on his crasher example.

Back-patch to 9.2 where the faulty logic was introduced.
2015-07-28 13:20:39 -04:00
Tom Lane fca8e59c1c Fix oversight in flattening of subqueries with empty FROM.
I missed a restriction that commit f4abd0241d
should have enforced: we can't pull up an empty-FROM subquery if it's under
an outer join, because then we'd need to wrap its output columns in
PlaceHolderVars.  As the code currently stands, the PHVs end up with empty
relid sets, which doesn't work (and is correctly caught by an Assert).

It's possible that this could be fixed by assigning the PHVs the relid
sets of the parent FromExpr/JoinExpr, but getting that to work is more
complication than I care to add right now; indeed it's likely that
we'll never bother, since pulling up empty-FROM subqueries is a rather
marginal optimization anyway.

Per report from Andreas Seltenreich.  Back-patch to 9.5 where the faulty
code was added.
2015-07-26 17:44:27 -04:00
Tom Lane 358eaa01bf Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL).  When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount.  Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist.  (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)

In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.

Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.

Per report from Andreas Seltenreich.  Back-patch to 9.2.  Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist.  It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 16:19:08 -04:00
Tom Lane 3b0f77601b Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons.  (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)

Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so.  The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins.  It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive.  Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.

Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths.  In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on.  That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized  paths match.

These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans.  (There is in fact one minor change in plan choice
within the core regression tests.)  Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-03 18:02:39 -04:00
Tom Lane 3cf8686014 Prevent improper reordering of antijoins vs. outer joins.
An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that.  In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks.  It's broken as far back as we had antijoins, so back-patch to all
supported branches.
2015-04-25 16:44:27 -04:00
Tom Lane ca6805338f Fix incorrect matching of subexpressions in outer-join plan nodes.
Previously we would re-use input subexpressions in all expression trees
attached to a Join plan node.  However, if it's an outer join and the
subexpression appears in the nullable-side input, this is potentially
incorrect for apparently-matching subexpressions that came from above
the outer join (ie, targetlist and qpqual expressions), because the
executor will treat the subexpression value as NULL when maybe it should
not be.

The case is fairly hard to hit because (a) you need a non-strict
subexpression (else NULL is correct), and (b) we don't usually compute
expressions in the outputs of non-toplevel plan nodes.  But we might do
so if the expressions are sort keys for a mergejoin, for example.

Probably in the long run we should make a more explicit distinction between
Vars appearing above and below an outer join, but that will be a major
planner redesign and not at all back-patchable.  For the moment, just hack
set_join_references so that it will not match any non-Var expressions
coming from nullable inputs to expressions that came from above the join.
(This is somewhat overkill, in that a strict expression could still be
matched, but it doesn't seem worth the effort to check that.)

Per report from Qingqing Zhou.  The added regression test case is based
on his example.

This has been broken for a very long time, so back-patch to all active
branches.
2015-04-04 19:55:15 -04:00
Tom Lane 542320c2bd Be more careful about printing constants in ruleutils.c.
The previous coding in get_const_expr() tried to avoid quoting integer,
float, and numeric literals if at all possible.  While that looks nice,
it means that dumped expressions might re-parse to something that's
semantically equivalent but not the exact same parsetree; for example
a FLOAT8 constant would re-parse as a NUMERIC constant with a cast to
FLOAT8.  Though the result would be the same after constant-folding,
this is problematic in certain contexts.  In particular, Jeff Davis
pointed out that this could cause unexpected failures in ALTER INHERIT
operations because of child tables having not-exactly-equivalent CHECK
expressions.  Therefore, favor correctness over legibility and dump
such constants in quotes except in the limited cases where they'll
be interpreted as the same type even without any casting.

This results in assorted small changes in the regression test outputs,
and will affect display of user-defined views and rules similarly.
The odds of that causing problems in the field seem non-negligible;
given the lack of previous complaints, it seems best not to change
this in the back branches.
2015-03-30 14:59:49 -04:00
Tom Lane f4abd0241d Support flattening of empty-FROM subqueries and one-row VALUES tables.
We can't handle this in the general case due to limitations of the
planner's data representations; but we can allow it in many useful cases,
by being careful to flatten only when we are pulling a single-row subquery
up into a FROM (or, equivalently, inner JOIN) node that will still have at
least one remaining relation child.  Per discussion of an example from
Kyotaro Horiguchi.
2015-03-11 23:18:03 -04:00
Tom Lane b746d0c32d Fix old bug in get_loop_count().
While poking at David Kubečka's issue I noticed an ancient logic error
in get_loop_count(): it used 1.0 as a "no data yet" indicator, but since
that is actually a valid rowcount estimate, this doesn't work.  If we
have one input relation with 1.0 as rowcount and then another one with
a larger rowcount, we should use 1.0 as the result, but we picked the
larger rowcount instead.  (I think when I coded this, I recognized the
conflict, but mistakenly thought that the logic would pick the desired
count anyway.)

Fixing this changed the plan for one existing regression test case.
Since the point of that test is to exercise creation of a particular
shape of nestloop plan, I tweaked the query a little bit so it still
results in the same plan choice.

This is definitely a bug, but I'm hesitant to back-patch since it might
change plan choices unexpectedly, and anyway failure to implement a
heuristic precisely as intended is a pretty low-grade bug.
2015-03-11 22:53:32 -04:00
Robert Haas e529cd4ffa Suggest to the user the column they may have meant to reference.
Error messages informing the user that no such column exists can
sometimes provoke a perplexed response.  This often happens due to
a subtle typo in the column name or, perhaps less likely, in the
alias name.  To speed discovery of what the real issue is in such
cases, we'll now search the range table for approximate matches.
If there are one or two such matches that are good enough to think
that they might be what the user intended to type, and better than
all other approximate matches, we'll issue a hint suggesting that
the user might have intended to reference those columns.

Peter Geoghegan and Robert Haas
2015-03-11 10:44:04 -04:00
Tom Lane b514a7460d Fix planning of star-schema-style queries.
Part of the intent of the parameterized-path mechanism was to handle
star-schema queries efficiently, but some overly-restrictive search
limiting logic added in commit e2fa76d80b
prevented such cases from working as desired.  Fix that and add a
regression test about it.  Per gripe from Marc Cousin.

This is arguably a bug rather than a new feature, so back-patch to 9.2
where parameterized paths were introduced.
2015-02-28 12:43:04 -05:00
Tom Lane 1b4cc493d2 Preserve AND/OR flatness while extracting restriction OR clauses.
The code I added in commit f343a880d5 was
careless about preserving AND/OR flatness: it could create a structure with
an OR node directly underneath another one.  That breaks an assumption
that's fairly important for planning efficiency, not to mention triggering
various Asserts (as reported by Benjamin Smith).  Add a trifle more logic
to handle the case properly.
2014-09-09 18:35:31 -04:00
Tom Lane f15821eefd Allow join removal in some cases involving a left join to a subquery.
We can remove a left join to a relation if the relation's output is
provably distinct for the columns involved in the join clause (considering
only equijoin clauses) and the relation supplies no variables needed above
the join.  Previously, the join removal logic could only prove distinctness
by reference to unique indexes of a table.  This patch extends the logic
to consider subquery relations, wherein distinctness might be proven by
reference to GROUP BY, DISTINCT, etc.

We actually already had some code to check that a subquery's output was
provably distinct, but it was hidden inside pathnode.c; which was a pretty
bad place for it really, since that file is mostly boilerplate Path
construction and comparison.  Move that code to analyzejoins.c, which is
arguably a more appropriate location, and is certainly the site of the
new usage for it.

David Rowley, reviewed by Simon Riggs
2014-07-15 21:12:43 -04:00
Tom Lane ab76208e3d Forward-port regression test for bug #10587 into 9.3 and HEAD.
Although this bug is already fixed in post-9.2 branches, the case
triggering it is quite different from what was under consideration
at the time.  It seems worth memorializing this example in HEAD
just to make sure it doesn't get broken again in future.

Extracted from commit 187ae17300.
2014-06-09 21:37:18 -04:00
Tom Lane a16d421ca4 Revert "Auto-tune effective_cache size to be 4x shared buffers"
This reverts commit ee1e5662d8, as well as
a remarkably large number of followup commits, which were mostly concerned
with the fact that the implementation didn't work terribly well.  It still
doesn't: we probably need some rather basic work in the GUC infrastructure
if we want to fully support GUCs whose default varies depending on the
value of another GUC.  Meanwhile, it also emerged that there wasn't really
consensus in favor of the definition the patch tried to implement (ie,
effective_cache_size should default to 4 times shared_buffers).  So whack
it all back to where it was.  In a followup commit, I'll do what was
recently agreed to, which is to simply change the default to a higher
value.
2014-05-08 20:49:38 -04:00
Tom Lane 043f6ff05d Fix bogus handling of "postponed" lateral quals.
When pulling a "postponed" qual from a LATERAL subquery up into the quals
of an outer join, we must make sure that the postponed qual is included
in those seen by make_outerjoininfo().  Otherwise we might compute a
too-small min_lefthand or min_righthand for the outer join, leading to
"JOIN qualification cannot refer to other relations" failures from
distribute_qual_to_rels.  Subtler errors in the created plan seem possible,
too, if the extra qual would only affect join ordering constraints.

Per bug #9041 from David Leverton.  Back-patch to 9.3.
2014-01-30 14:51:16 -05:00
Tom Lane 158b7fa6a3 Disallow LATERAL references to the target table of an UPDATE/DELETE.
On second thought, commit 0c051c9008 was
over-hasty: rather than allowing this case, we ought to reject it for now.
That leaves the field clear for a future feature that allows the target
table to be re-specified in the FROM (or USING) clause, which will enable
left-joining the target table to something else.  We can then also allow
LATERAL references to such an explicitly re-specified target table.
But allowing them right now will create ambiguities or worse for such a
feature, and it isn't something we documented 9.3 as supporting.

While at it, add a convenience subroutine to avoid having several copies
of the ereport for disalllowed-LATERAL-reference cases.
2014-01-11 19:03:12 -05:00
Tom Lane 0c051c9008 Fix LATERAL references to target table of UPDATE/DELETE.
I failed to think much about UPDATE/DELETE when implementing LATERAL :-(.
The implemented behavior ended up being that subqueries in the FROM or
USING clause (respectively) could access the update/delete target table as
though it were a lateral reference; which seems fine if they said LATERAL,
but certainly ought to draw an error if they didn't.  Fix it so you get a
suitable error when you omit LATERAL.  Per report from Emre Hasegeli.
2014-01-07 15:25:27 -05:00
Tom Lane f343a880d5 Extract restriction OR clauses whether or not they are indexable.
It's possible to extract a restriction OR clause from a join clause that
has the form of an OR-of-ANDs, if each sub-AND includes a clause that
mentions only one specific relation.  While PG has been aware of that idea
for many years, the code previously only did it if it could extract an
indexable OR clause.  On reflection, though, that seems a silly limitation:
adding a restriction clause can be a win by reducing the number of rows
that have to be filtered at the join step, even if we have to test the
clause as a plain filter clause during the scan.  This should be especially
useful for foreign tables, where the change can cut the number of rows that
have to be retrieved from the foreign server; but testing shows it can win
even on local tables.  Per a suggestion from Robert Haas.

As a heuristic, I made the code accept an extracted restriction clause
if its estimated selectivity is less than 0.9, which will probably result
in accepting extracted clauses just about always.  We might need to tweak
that later based on experience.

Since the code no longer has even a weak connection to Path creation,
remove orindxpath.c and create a new file optimizer/util/orclauses.c.

There's some additional janitorial cleanup of now-dead code that needs
to happen, but it seems like that's a fit subject for a separate commit.
2013-12-30 12:24:37 -05:00
Tom Lane b5e0a2a384 Tweak placement of explicit ANALYZE commands in the regression tests.
Make the COPY test, which loads most of the large static tables used in
the tests, also explicitly ANALYZE those tables.  This allows us to get
rid of various ad-hoc, and rather redundant, ANALYZE commands that had
gotten stuck into various test scripts over time to ensure we got
consistent plan choices.  (We could have done a database-wide ANALYZE,
but that would cause stats to get attached to the small static tables
too, which results in plan changes compared to the historical behavior.
I'm not sure that's a good idea, so not going that far for now.)

Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing
regression tests for lack of an "ANALYZE tenk1" in the subselect test.
There's no need for this in 8.4 since we didn't print any plans back
then.
2013-12-11 15:09:15 -05:00
Tom Lane f19e92ed04 Flatten join alias Vars before pulling up targetlist items from a subquery.
pullup_replace_vars()'s decisions about whether a pulled-up replacement
expression needs to be wrapped in a PlaceHolderVar depend on the assumption
that what looks like a Var behaves like a Var.  However, if the Var is a
join alias reference, later flattening of join aliases might replace the
Var with something that's not a Var at all, and should have been wrapped.

To fix, do a forcible pass of flatten_join_alias_vars() on the subquery
targetlist before we start to copy items out of it.  We'll re-run that
processing on the pulled-up expressions later, but that's harmless.

Per report from Ken Tanzer; the added regression test case is based on his
example.  This bug has been there since the PlaceHolderVar mechanism was
invented, but has escaped detection because the circumstances that trigger
it are fairly narrow.  You need a flattenable query underneath an outer
join, which contains another flattenable query inside a join of its own,
with a dangerous expression (a constant or something else non-strict)
in that one's targetlist.

Having seen this, I'm wondering if it wouldn't be prudent to do all
alias-variable flattening earlier, perhaps even in the rewriter.
But that would probably not be a back-patchable change.
2013-11-22 14:37:21 -05:00
Tom Lane f3b3b8d5be Compute correct em_nullable_relids in get_eclass_for_sort_expr().
Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr
must be able to compute valid em_nullable_relids for any new equivalence
class members it creates.  I'd worried about this in the commit message
for db9f0e1d9a, but claimed that it wasn't a
problem because multi-member ECs should already exist when it runs.  That
is transparently wrong, though, because this function is also called by
initialize_mergeclause_eclasses, which runs during deconstruct_jointree.
The example given in the bug report (which the new regression test item
is based upon) fails because the COALESCE() expression is first seen by
initialize_mergeclause_eclasses rather than process_equivalence.

Fixing this requires passing the appropriate nullable_relids set to
get_eclass_for_sort_expr, and it requires new code to compute that set
for top-level expressions such as ORDER BY, GROUP BY, etc.  We store
the top-level nullable_relids in a new field in PlannerInfo to avoid
computing it many times.  In the back branches, I've added the new
field at the end of the struct to minimize ABI breakage for planner
plugins.  There doesn't seem to be a good alternative to changing
get_eclass_for_sort_expr's API signature, though.  There probably aren't
any third-party extensions calling that function directly; moreover,
if there are, they probably need to think about what to pass for
nullable_relids anyway.

Back-patch to 9.2, like the previous patch in this area.
2013-11-15 16:46:18 -05:00
Tom Lane 648bd05b13 Re-allow duplicate aliases within aliased JOINs.
Although the SQL spec forbids duplicate table aliases, historically
we've allowed queries like
    SELECT ... FROM tab1 x CROSS JOIN (tab2 x CROSS JOIN tab3 y) z
on the grounds that the aliased join (z) hides the aliases within it,
therefore there is no conflict between the two RTEs named "x".  The
LATERAL patch broke this, on the misguided basis that "x" could be
ambiguous if tab3 were a LATERAL subquery.  To avoid breaking existing
queries, it's better to allow this situation and complain only if
tab3 actually does contain an ambiguous reference.  We need only remove
the check that was throwing an error, because the column lookup code
is already prepared to handle ambiguous references.  Per bug #8444.
2013-11-11 10:42:57 -05:00
Bruce Momjian ee1e5662d8 Auto-tune effective_cache size to be 4x shared buffers 2013-10-08 12:12:24 -04:00
Tom Lane c64de21e96 Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries.
In an example such as
SELECT * FROM
  i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;
it is safe to pull up the LATERAL subquery into its parent, but we must
then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN.  The
previous coding in deconstruct_recurse mistakenly labeled the clause as
"is_pushed_down", resulting in wrong semantics if the clause were applied
at the join node, as per an example submitted awhile ago by Jeremy Evans.
To fix, postpone processing of such clauses until we return back up to
the appropriate recursion depth in deconstruct_recurse.

In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery;
we previously missed the possibility that the LATERAL subquery might itself
contain an outer join that makes lateral references in lower quals unsafe.

A regression test case equivalent to Jeremy's example was already in my
commit of yesterday, but was giving the wrong results because of this
bug.  This patch fixes the expected output for that, and also adds a
test case for the second problem.
2013-08-19 13:19:41 -04:00
Tom Lane 9e7e29c75a Fix planner problems with LATERAL references in PlaceHolderVars.
The planner largely failed to consider the possibility that a
PlaceHolderVar's expression might contain a lateral reference to a Var
coming from somewhere outside the PHV's syntactic scope.  We had a previous
report of a problem in this area, which I tried to fix in a quick-hack way
in commit 4da6439bd8, but Antonin Houska
pointed out that there were still some problems, and investigation turned
up other issues.  This patch largely reverts that commit in favor of a more
thoroughly thought-through solution.  The new theory is that a PHV's
ph_eval_at level cannot be higher than its original syntactic level.  If it
contains lateral references, those don't change the ph_eval_at level, but
rather they create a lateral-reference requirement for the ph_eval_at join
relation.  The code in joinpath.c needs to handle that.

Another issue is that createplan.c wasn't handling nested PlaceHolderVars
properly.

In passing, push knowledge of lateral-reference checks for join clauses
into join_clause_is_movable_to.  This is mainly so that FDWs don't need
to deal with it.

This patch doesn't fix the original join-qual-placement problem reported by
Jeremy Evans (and indeed, one of the new regression test cases shows the
wrong answer because of that).  But the PlaceHolderVar problems need to be
fixed before that issue can be addressed, so committing this separately
seems reasonable.
2013-08-17 20:22:37 -04:00
Tom Lane 1b1d3d92c3 Remove ph_may_need from PlaceHolderInfo, with attendant simplifications.
The planner logic that attempted to make a preliminary estimate of the
ph_needed levels for PlaceHolderVars seems to be completely broken by
lateral references.  Fortunately, the potential join order optimization
that this code supported seems to be of relatively little value in
practice; so let's just get rid of it rather than trying to fix it.

Getting rid of this allows fairly substantial simplifications in
placeholder.c, too, so planning in such cases should be a bit faster.

Issue noted while pursuing bugs reported by Jeremy Evans and Antonin
Houska, though this doesn't in itself fix either of their reported cases.
What this does do is prevent an Assert crash in the kind of query
illustrated by the added regression test.  (I'm not sure that the plan for
that query is stable enough across platforms to be usable as a regression
test output ... but we'll soon find out from the buildfarm.)

Back-patch to 9.3.  The problem case can't arise without LATERAL, so
no need to touch older branches.
2013-08-14 18:38:47 -04:00
Tom Lane db9f0e1d9a Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.

The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner.  However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.

There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner).  I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series.  This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.

I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test.  The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids.  I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical.  So for the moment, do this and
stop here.

Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct.  (We might have to revisit that choice
if any related bugs turn up.)  In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 14:50:03 -04:00
Tom Lane 2378d79ab2 Make LATERAL implicit for functions in FROM.
The SQL standard does not have general functions-in-FROM, but it does
allow UNNEST() there (see the <collection derived table> production),
and the semantics of that are defined to include lateral references.
So spec compliance requires allowing lateral references within UNNEST()
even without an explicit LATERAL keyword.  Rather than making UNNEST()
a special case, it seems best to extend this flexibility to any
function-in-FROM.  We'll still allow LATERAL to be written explicitly
for clarity's sake, but it's now a noise word in this context.

In theory this change could result in a change in behavior of existing
queries, by allowing what had been an outer reference in a function-in-FROM
to be captured by an earlier FROM-item at the same level.  However, all
pre-9.3 PG releases have a bug that causes them to match variable
references to earlier FROM-items in preference to outer references (and
then throw an error).  So no previously-working query could contain the
type of ambiguity that would risk a change of behavior.

Per a suggestion from Andrew Gierth, though I didn't use his patch.
2013-01-26 16:18:42 -05:00
Tom Lane 72a4231f0c Fix planning of non-strict equivalence clauses above outer joins.
If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable.  (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.)  The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence.  Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.

In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs.  This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places.  (This accounts for bug #7604 from Bill MacArthur.)

Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses.  Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.

Add regression tests illustrating these problems.  In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.
2012-10-18 12:30:10 -04:00
Tom Lane 3b8968f252 Rethink heuristics for choosing index quals for parameterized paths.
Some experimentation with examples similar to bug #7539 has convinced me
that indxpath.c's original implementation of parameterized-path generation
was several bricks shy of a load.  In general, if we are relying on a
particular outer rel or set of outer rels for a parameterized path, the
path should use every indexable join clause that's available from that rel
or rels.  Any join clauses that get left out of the indexqual will end up
getting applied as plain filter quals (qpquals), and that's generally a
significant loser compared to having the index AM enforce them.  (This is
particularly true with btree, which can skip the index scan entirely if
it can see that the given indexquals are mutually contradictory.)  The
original heuristics failed to ensure this, though, and were overly
complicated anyway.  Rewrite to make the code explicitly identify each
useful set of outer rels and then select all applicable join clauses for
each one.  The one plan that changes in the regression tests is in fact
for the better according to the planner's cost estimates.

(Note: this is not a correctness issue but just a matter of plan quality.
I don't yet know what is going on in bug #7539, but I don't expect this
change to fix that.)
2012-09-16 17:58:09 -04:00
Tom Lane 6d2c8c0e2a Drop cheap-startup-cost paths during add_path() if we don't need them.
We can detect whether the planner top level is going to care at all about
cheap startup cost (it will only do so if query_planner's tuple_fraction
argument is greater than zero).  If it isn't, we might as well discard
paths immediately whose only advantage over others is cheap startup cost.
This turns out to get rid of quite a lot of paths in complex queries ---
I saw planner runtime reduction of more than a third on one large query.

Since add_path isn't currently passed the PlannerInfo "root", the easiest
way to tell it whether to do this was to add a bool flag to RelOptInfo.
That's a bit redundant, since all relations in a given query level will
have the same setting.  But in the future it's possible that we'd refine
the control decision to work on a per-relation basis, so this seems like
a good arrangement anyway.

Per my suggestion of a few months ago.
2012-09-01 18:16:24 -04:00
Tom Lane 4da6439bd8 Fix mark_placeholder_maybe_needed to handle LATERAL references.
If a PlaceHolderVar contains a pulled-up LATERAL reference, its minimum
possible evaluation level might be higher in the join tree than its
original syntactic location.  That in turn affects the ph_needed level for
any contained PlaceHolderVars (that is, those PHVs had better propagate up
the join tree at least to the evaluation level of the outer PHV).  We got
this mostly right, but mark_placeholder_maybe_needed() failed to account
for the effect, and in consequence could leave the inner PHVs with
ph_may_need less than what their ultimate ph_needed value will be.  That's
bad because it could lead to failure to select a join order that will allow
evaluation of the inner PHV at a valid location.  Fix that, and add an
Assert that checks that we don't ever set ph_needed to more than
ph_may_need.
2012-09-01 13:56:46 -04:00
Tom Lane da3df99870 Fix LATERAL references to join alias variables.
I had thought this case worked already, but perhaps I didn't re-test it
after adding extract_lateral_references() ...
2012-08-31 17:44:31 -04:00
Tom Lane d1a4db8d25 Improve EXPLAIN's ability to cope with LATERAL references in plans.
push_child_plan/pop_child_plan didn't bother to adjust the "ancestors"
list of parent plan nodes when descending to a child plan node.  I think
this was okay when it was written, but it's not okay in the presence of
LATERAL references, since a subplan node could easily be returning a
LATERAL value back up to the same nestloop node that provides the value.
Per changed regression test results, the omission led to failure to
interpret Param nodes that have perfectly good interpretations.
2012-08-30 12:56:50 -04:00
Tom Lane e83bb10d6d Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths.  It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization.  (The first rule is actually
a special case of the second.)

This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available.  This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.

Along the way, simplify management of parameterized paths in add_path()
and friends.  In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible.  But we didn't take that reasoning as far as
we should have.  Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well.  Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels.  So this
should be a bit faster as well as simpler.
2012-08-29 22:06:07 -04:00
Tom Lane 9ff79b9d4e Fix up planner infrastructure to support LATERAL properly.
This patch takes care of a number of problems having to do with failure
to choose valid join orders and incorrect handling of lateral references
pulled up from subqueries.  Notable changes:

* Add a LateralJoinInfo data structure similar to SpecialJoinInfo, to
represent join ordering constraints created by lateral references.
(I first considered extending the SpecialJoinInfo structure, but the
semantics are different enough that a separate data structure seems
better.)  Extend join_is_legal() and related functions to prevent trying
to form unworkable joins, and to ensure that we will consider joins that
satisfy lateral references even if the joins would be clauseless.

* Fill in the infrastructure needed for the last few types of relation scan
paths to support parameterization.  We'd have wanted this eventually
anyway, but it is necessary now because a relation that gets pulled up out
of a UNION ALL subquery may acquire a reltargetlist containing lateral
references, meaning that its paths *have* to be parameterized whether or
not we have any code that can push join quals down into the scan.

* Compute data about lateral references early in query_planner(), and save
in RelOptInfo nodes, to avoid repetitive calculations later.

* Assorted corner-case bug fixes.

There's probably still some bugs left, but this is a lot closer to being
real than it was before.
2012-08-26 22:50:23 -04:00
Tom Lane 084a29c94f Another round of planner fixes for LATERAL.
Formerly, subquery pullup had no need to examine other entries in the range
table, since they could not contain any references to the subquery being
pulled up.  That's no longer true with LATERAL, so now we need to be able
to visit rangetable subexpressions to replace Vars referencing the
pulled-up subquery.  Also, this means that extract_lateral_references must
be unsurprised at encountering lateral PlaceHolderVars, since such might be
created when pulling up a subquery that's underneath an outer join with
respect to the lateral reference.
2012-08-18 14:10:17 -04:00
Tom Lane f5983923d8 Allow create_index_paths() to consider multiple join bitmapscan paths.
In the initial cut at the "parameterized paths" feature, I'd simplified
create_index_paths() to the point where it would only generate a single
parameterized bitmap path per relation.  Experimentation with an example
supplied by Josh Berkus convinces me that that's not good enough: we really
need to consider a bitmap path for each possible outer relation.  Otherwise
we have regressions relative to pre-9.2 versions, in which the planner
picks a plain indexscan where it should have used a bitmap scan in queries
involving three or more tables.  Indeed, after fixing this, several queries
in the regression tests show improved plans as a result of using bitmap not
plain indexscans.
2012-08-16 13:03:54 -04:00
Tom Lane c1774d2c81 More fixes for planner's handling of LATERAL.
Re-allow subquery pullup for LATERAL subqueries, except when the subquery
is below an outer join and contains lateral references to relations outside
that outer join.  If we pull up in such a case, we risk introducing lateral
cross-references into outer joins' ON quals, which is something the code is
entirely unprepared to cope with right now; and I'm not sure it'll ever be
worth coping with.

Support lateral refs in VALUES (this seems to be the only additional path
type that needs such support as a consequence of re-allowing subquery
pullup).

Put in a slightly hacky fix for joinpath.c's refusal to consider
parameterized join paths even when there cannot be any unparameterized
ones.  This was causing "could not devise a query plan for the given query"
failures in queries involving more than two FROM items.

Put in an even more hacky fix for distribute_qual_to_rels() being unhappy
with join quals that contain references to rels outside their syntactic
scope; which is to say, disable that test altogether.  Need to think about
how to preserve some sort of debugging cross-check here, while not
expending more cycles than befits a debugging cross-check.
2012-08-12 16:01:26 -04:00
Tom Lane e76af54137 Fix some issues with LATERAL(SELECT UNION ALL SELECT).
The LATERAL marking has to be propagated down to the UNION leaf queries
when we pull them up.  Also, fix the formerly stubbed-off
set_append_rel_pathlist().  It does already have enough smarts to cope with
making a parameterized Append path at need; it just has to not assume that
there *must* be an unparameterized path.
2012-08-11 18:42:56 -04:00
Tom Lane eaccfded98 Centralize the logic for detecting misplaced aggregates, window funcs, etc.
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree.  To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed.  This allows removal of a large number of ad-hoc
checks scattered around the code base.  The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.

Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.

Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.

In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone.  (I didn't risk actually removing said dead code, though.)
2012-08-10 11:36:15 -04:00
Tom Lane 5ebaaa4944 Implement SQL-standard LATERAL subqueries.
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.

The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned.  The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags.  Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.

In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason.  (It was mainly
add_missing_from that caused so much fudging here in the first place.)

The planner support for this feature is very minimal, and will be improved
in future patches.  It works well enough for testing purposes, though.

catversion bump forced due to new field in RangeTblEntry.
2012-08-07 19:02:54 -04:00
Robert Haas d7c734841b Reduce messages about implicit indexes and sequences to DEBUG1.
Per recent discussion on pgsql-hackers, these messages are too
chatty for most users.
2012-07-04 20:35:29 -04:00
Tom Lane 33e99153e9 Use fuzzy not exact cost comparison for the final tie-breaker in add_path.
Instead of an exact cost comparison, use a fuzzy comparison with 1e-10
delta after all other path metrics have proved equal.  This is to avoid
having platform-specific roundoff behaviors determine the choice when
two paths are really the same to our cost estimators.  Adjust the
recently-added test case that made it obvious we had a problem here.
2012-04-21 00:51:14 -04:00
Tom Lane 5b7b5518d0 Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate.  We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.

In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage.  This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.

To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing.  This is required at both base scans and joins.  It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree.  Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 15:53:47 -04:00
Tom Lane e3ffd05b02 Weaken the planner's tests for relevant joinclauses.
We should be willing to cross-join two small relations if that allows us
to use an inner indexscan on a large relation (that is, the potential
indexqual for the large table requires both smaller relations).  This
worked in simple cases but fell apart as soon as there was a join clause
to a fourth relation, because the existence of any two-relation join clause
caused the planner to not consider clauseless joins between other base
relations.  The added regression test shows an example case adapted from
a recent complaint from Benoit Delbosc.

Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and
has_relevant_eclass_joinclause to consider that a join clause mentioning
three or more relations is sufficient grounds for joining any subset of
those relations, even if we have to do so via a cartesian join.  Since such
clauses are relatively uncommon, this shouldn't affect planning speed on
typical queries; in fact it should help a bit, because the latter two
functions in particular get significantly simpler.

Although this is arguably a bug fix, I'm not going to risk back-patching
it, since it might have currently-unforeseen consequences.
2012-04-13 16:07:17 -04:00
Tom Lane 8279eb4191 Fix planner's handling of outer PlaceHolderVars within subqueries.
For some reason, in the original coding of the PlaceHolderVar mechanism
I had supposed that PlaceHolderVars couldn't propagate into subqueries.
That is of course entirely possible.  When it happens, we need to treat
an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
a Param, and then when building the finished SubPlan we have to provide
the PlaceHolderVar expression as an actual parameter for the SubPlan.
The handling of the contained expression is a bit delicate but it can be
treated exactly like an Aggref's expression.

In addition to the missing logic in subselect.c, prepjointree.c was failing
to search subqueries for PlaceHolderVars that need their relids adjusted
during subquery pullup.  It looks like everyplace else that touches
PlaceHolderVars got it right, though.

Per report from Mark Murawski.  In 9.1 and HEAD, queries affected by this
oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
not expected".  But in 9.0 and 8.4, you'd silently get possibly-wrong
answers, since the value transmitted into the subquery wouldn't go to null
when it should.
2012-03-24 16:21:39 -04:00