1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* planmain.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* Routines to plan a single query
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2000-03-21 06:12:12 +01:00
|
|
|
* What's in a name, anyway? The top-level entry point of the planner/
|
|
|
|
* optimizer is over in planner.c, not here as you might think from the
|
|
|
|
* file name. But this is the main code for planning a basic join operation,
|
|
|
|
* shorn of features like subselects, inheritance, aggregates, grouping,
|
|
|
|
* and so on. (Those are the things planner.c deals with.)
|
|
|
|
*
|
2019-01-02 18:44:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/optimizer/plan/planmain.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2019-01-10 18:54:31 +01:00
|
|
|
#include "optimizer/appendinfo.h"
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
#include "optimizer/clauses.h"
|
2019-01-10 18:54:31 +01:00
|
|
|
#include "optimizer/inherit.h"
|
2019-01-29 21:48:51 +01:00
|
|
|
#include "optimizer/optimizer.h"
|
Extract restriction OR clauses whether or not they are indexable.
It's possible to extract a restriction OR clause from a join clause that
has the form of an OR-of-ANDs, if each sub-AND includes a clause that
mentions only one specific relation. While PG has been aware of that idea
for many years, the code previously only did it if it could extract an
indexable OR clause. On reflection, though, that seems a silly limitation:
adding a restriction clause can be a win by reducing the number of rows
that have to be filtered at the join step, even if we have to test the
clause as a plain filter clause during the scan. This should be especially
useful for foreign tables, where the change can cut the number of rows that
have to be retrieved from the foreign server; but testing shows it can win
even on local tables. Per a suggestion from Robert Haas.
As a heuristic, I made the code accept an extracted restriction clause
if its estimated selectivity is less than 0.9, which will probably result
in accepting extracted clauses just about always. We might need to tweak
that later based on experience.
Since the code no longer has even a weak connection to Path creation,
remove orindxpath.c and create a new file optimizer/util/orclauses.c.
There's some additional janitorial cleanup of now-dead code that needs
to happen, but it seems like that's a fit subject for a separate commit.
2013-12-30 18:24:37 +01:00
|
|
|
#include "optimizer/orclauses.h"
|
2000-02-15 21:49:31 +01:00
|
|
|
#include "optimizer/pathnode.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "optimizer/paths.h"
|
2008-10-21 22:42:53 +02:00
|
|
|
#include "optimizer/placeholder.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
#include "optimizer/planmain.h"
|
|
|
|
|
|
|
|
|
2005-08-28 00:13:44 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* query_planner
|
2002-11-06 01:00:45 +01:00
|
|
|
* Generate a path (that is, a simplified plan) for a basic query,
|
|
|
|
* which may involve joins but not any fancier features.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
2002-11-06 01:00:45 +01:00
|
|
|
* Since query_planner does not handle the toplevel processing (grouping,
|
2014-05-06 18:12:18 +02:00
|
|
|
* sorting, etc) it cannot select the best path by itself. Instead, it
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
* returns the RelOptInfo for the top level of joining, and the caller
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* (grouping_planner) can choose among the surviving paths for the rel.
|
2002-11-06 01:00:45 +01:00
|
|
|
*
|
2005-06-06 00:32:58 +02:00
|
|
|
* root describes the query to plan
|
|
|
|
* tlist is the target list the query should produce
|
|
|
|
* (this is NOT necessarily root->parse->targetList!)
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
* qp_callback is a function to compute query_pathkeys once it's safe to do so
|
|
|
|
* qp_extra is optional extra data to pass to qp_callback
|
2000-02-15 21:49:31 +01:00
|
|
|
*
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
* Note: the PlannerInfo node also includes a query_pathkeys field, which
|
|
|
|
* tells query_planner the sort order that is desired in the final output
|
|
|
|
* plan. This value is *not* available at call time, but is computed by
|
|
|
|
* qp_callback once we have completed merging the query's equivalence classes.
|
|
|
|
* (We cannot construct canonical pathkeys until that's done.)
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
RelOptInfo *
|
2007-05-04 03:13:45 +02:00
|
|
|
query_planner(PlannerInfo *root, List *tlist,
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
query_pathkeys_callback qp_callback, void *qp_extra)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2005-06-06 00:32:58 +02:00
|
|
|
Query *parse = root->parse;
|
2005-12-20 03:30:36 +01:00
|
|
|
List *joinlist;
|
2002-11-06 01:00:45 +01:00
|
|
|
RelOptInfo *final_rel;
|
1998-02-26 05:46:47 +01:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2011-09-03 21:35:12 +02:00
|
|
|
* Init planner lists to empty.
|
2003-01-20 19:55:07 +01:00
|
|
|
*
|
2008-08-14 20:48:00 +02:00
|
|
|
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* here.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
1998-08-10 04:26:40 +02:00
|
|
|
root->join_rel_list = NIL;
|
2005-06-09 01:02:05 +02:00
|
|
|
root->join_rel_hash = NULL;
|
2009-11-28 01:46:19 +01:00
|
|
|
root->join_rel_level = NULL;
|
|
|
|
root->join_cur_level = 0;
|
2007-01-20 21:45:41 +01:00
|
|
|
root->canon_pathkeys = NIL;
|
Teach planner about some cases where a restriction clause can be
propagated inside an outer join. In particular, given
LEFT JOIN ON (A = B) WHERE A = constant, we cannot conclude that
B = constant at the top level (B might be null instead), but we
can nonetheless put a restriction B = constant into the quals for
B's relation, since no inner-side rows not meeting that condition
can contribute to the final result. Similarly, given
FULL JOIN USING (J) WHERE J = constant, we can't directly conclude
that either input J variable = constant, but it's OK to push such
quals into each input rel. Per recent gripe from Kim Bisgaard.
Along the way, remove 'valid_everywhere' flag from RestrictInfo,
as on closer analysis it was not being used for anything, and was
defined backwards anyway.
2005-07-03 01:00:42 +02:00
|
|
|
root->left_join_clauses = NIL;
|
|
|
|
root->right_join_clauses = NIL;
|
|
|
|
root->full_join_clauses = NIL;
|
2008-08-14 20:48:00 +02:00
|
|
|
root->join_info_list = NIL;
|
2008-10-22 22:17:52 +02:00
|
|
|
root->placeholder_list = NIL;
|
2016-06-18 21:22:34 +02:00
|
|
|
root->fkey_list = NIL;
|
2008-01-11 05:02:18 +01:00
|
|
|
root->initial_rels = NIL;
|
1998-08-10 04:26:40 +02:00
|
|
|
|
2007-04-21 23:01:45 +02:00
|
|
|
/*
|
2007-11-15 22:14:46 +01:00
|
|
|
* Make a flattened version of the rangetable for faster access (this is
|
2012-06-10 21:20:04 +02:00
|
|
|
* OK because the rangetable won't change any more), and set up an empty
|
|
|
|
* array for indexing base relations.
|
2007-04-21 23:01:45 +02:00
|
|
|
*/
|
2011-09-03 21:35:12 +02:00
|
|
|
setup_simple_rel_arrays(root);
|
2007-04-21 23:01:45 +02:00
|
|
|
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
/*
|
|
|
|
* In the trivial case where the jointree is a single RTE_RESULT relation,
|
|
|
|
* bypass all the rest of this function and just make a RelOptInfo and its
|
|
|
|
* one access path. This is worth optimizing because it applies for
|
|
|
|
* common cases like "SELECT expression" and "INSERT ... VALUES()".
|
|
|
|
*/
|
|
|
|
Assert(parse->jointree->fromlist != NIL);
|
|
|
|
if (list_length(parse->jointree->fromlist) == 1)
|
|
|
|
{
|
|
|
|
Node *jtnode = (Node *) linitial(parse->jointree->fromlist);
|
|
|
|
|
|
|
|
if (IsA(jtnode, RangeTblRef))
|
|
|
|
{
|
|
|
|
int varno = ((RangeTblRef *) jtnode)->rtindex;
|
|
|
|
RangeTblEntry *rte = root->simple_rte_array[varno];
|
|
|
|
|
|
|
|
Assert(rte != NULL);
|
|
|
|
if (rte->rtekind == RTE_RESULT)
|
|
|
|
{
|
|
|
|
/* Make the RelOptInfo for it directly */
|
|
|
|
final_rel = build_simple_rel(root, varno, NULL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If query allows parallelism in general, check whether the
|
|
|
|
* quals are parallel-restricted. (We need not check
|
|
|
|
* final_rel->reltarget because it's empty at this point.
|
|
|
|
* Anything parallel-restricted in the query tlist will be
|
|
|
|
* dealt with later.) This is normally pretty silly, because
|
|
|
|
* a Result-only plan would never be interesting to
|
|
|
|
* parallelize. However, if force_parallel_mode is on, then
|
|
|
|
* we want to execute the Result in a parallel worker if
|
|
|
|
* possible, so we must do this.
|
|
|
|
*/
|
|
|
|
if (root->glob->parallelModeOK &&
|
|
|
|
force_parallel_mode != FORCE_PARALLEL_OFF)
|
|
|
|
final_rel->consider_parallel =
|
|
|
|
is_parallel_safe(root, parse->jointree->quals);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The only path for it is a trivial Result path. We cheat a
|
|
|
|
* bit here by using a GroupResultPath, because that way we
|
|
|
|
* can just jam the quals into it without preprocessing them.
|
|
|
|
* (But, if you hold your head at the right angle, a FROM-less
|
|
|
|
* SELECT is a kind of degenerate-grouping case, so it's not
|
|
|
|
* that much of a cheat.)
|
|
|
|
*/
|
|
|
|
add_path(final_rel, (Path *)
|
|
|
|
create_group_result_path(root, final_rel,
|
|
|
|
final_rel->reltarget,
|
|
|
|
(List *) parse->jointree->quals));
|
|
|
|
|
|
|
|
/* Select cheapest path (pretty easy in this case...) */
|
|
|
|
set_cheapest(final_rel);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We still are required to call qp_callback, in case it's
|
|
|
|
* something like "SELECT 2+2 ORDER BY 1".
|
|
|
|
*/
|
|
|
|
(*qp_callback) (root, qp_extra);
|
|
|
|
|
|
|
|
return final_rel;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-06-26 16:35:26 +02:00
|
|
|
/*
|
|
|
|
* Populate append_rel_array with each AppendRelInfo to allow direct
|
|
|
|
* lookups by child relid.
|
|
|
|
*/
|
|
|
|
setup_append_rel_array(root);
|
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
/*
|
2006-09-20 00:49:53 +02:00
|
|
|
* Construct RelOptInfo nodes for all base relations in query, and
|
|
|
|
* indirectly for all appendrel member relations ("other rels"). This
|
2006-10-04 02:30:14 +02:00
|
|
|
* will give us a RelOptInfo for every "simple" (non-join) rel involved in
|
|
|
|
* the query.
|
2006-09-20 00:49:53 +02:00
|
|
|
*
|
|
|
|
* Note: the reason we find the rels by searching the jointree and
|
|
|
|
* appendrel list, rather than just scanning the rangetable, is that the
|
|
|
|
* rangetable may contain RTEs for rels not actively part of the query,
|
|
|
|
* for example views. We don't want to make RelOptInfos for them.
|
2000-09-12 23:07:18 +02:00
|
|
|
*/
|
2005-06-06 00:32:58 +02:00
|
|
|
add_base_rels_to_query(root, (Node *) parse->jointree);
|
2000-09-12 23:07:18 +02:00
|
|
|
|
|
|
|
/*
|
2010-09-28 18:08:56 +02:00
|
|
|
* Examine the targetlist and join tree, adding entries to baserel
|
|
|
|
* targetlists for all referenced Vars, and generating PlaceHolderInfo
|
2014-05-06 18:12:18 +02:00
|
|
|
* entries for all referenced PlaceHolderVars. Restrict and join clauses
|
2011-04-10 17:42:00 +02:00
|
|
|
* are added to appropriate lists belonging to the mentioned relations. We
|
|
|
|
* also build EquivalenceClasses for provably equivalent expressions. The
|
|
|
|
* SpecialJoinInfo list is also built to hold information about join order
|
|
|
|
* restrictions. Finally, we form a target joinlist for make_one_rel() to
|
|
|
|
* work from.
|
2000-07-24 05:11:01 +02:00
|
|
|
*/
|
2002-11-06 01:00:45 +01:00
|
|
|
build_base_rel_tlists(root, tlist);
|
2000-09-12 23:07:18 +02:00
|
|
|
|
Revisit handling of UNION ALL subqueries with non-Var output columns.
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
2012-03-16 18:11:12 +01:00
|
|
|
find_placeholders_in_jointree(root);
|
2010-09-28 18:08:56 +02:00
|
|
|
|
2012-08-27 04:48:55 +02:00
|
|
|
find_lateral_references(root);
|
|
|
|
|
2005-12-20 03:30:36 +01:00
|
|
|
joinlist = deconstruct_jointree(root);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
/*
|
2007-01-20 21:45:41 +01:00
|
|
|
* Reconsider any postponed outer-join quals now that we have built up
|
|
|
|
* equivalence classes. (This could result in further additions or
|
|
|
|
* mergings of classes.)
|
2000-07-24 05:11:01 +02:00
|
|
|
*/
|
2007-01-20 21:45:41 +01:00
|
|
|
reconsider_outer_join_clauses(root);
|
2000-07-24 05:11:01 +02:00
|
|
|
|
2000-02-15 21:49:31 +01:00
|
|
|
/*
|
2007-01-20 21:45:41 +01:00
|
|
|
* If we formed any equivalence classes, generate additional restriction
|
2014-05-06 18:12:18 +02:00
|
|
|
* clauses as appropriate. (Implied join clauses are formed on-the-fly
|
2007-01-20 21:45:41 +01:00
|
|
|
* later.)
|
|
|
|
*/
|
|
|
|
generate_base_implied_equalities(root);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We have completed merging equivalence sets, so it's now possible to
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
* generate pathkeys in canonical form; so compute query_pathkeys and
|
|
|
|
* other pathkeys fields in PlannerInfo.
|
2000-02-15 21:49:31 +01:00
|
|
|
*/
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
(*qp_callback) (root, qp_extra);
|
2000-02-15 21:49:31 +01:00
|
|
|
|
2008-10-21 22:42:53 +02:00
|
|
|
/*
|
|
|
|
* Examine any "placeholder" expressions generated during subquery pullup.
|
2010-09-28 18:08:56 +02:00
|
|
|
* Make sure that the Vars they need are marked as needed at the relevant
|
2014-05-06 18:12:18 +02:00
|
|
|
* join level. This must be done before join removal because it might
|
2010-09-28 18:08:56 +02:00
|
|
|
* cause Vars or placeholders to be needed above a join when they weren't
|
|
|
|
* so marked before.
|
2008-10-21 22:42:53 +02:00
|
|
|
*/
|
2010-09-28 18:08:56 +02:00
|
|
|
fix_placeholder_input_needed_levels(root);
|
2008-10-21 22:42:53 +02:00
|
|
|
|
2010-03-29 00:59:34 +02:00
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* Remove any useless outer joins. Ideally this would be done during
|
2010-03-29 00:59:34 +02:00
|
|
|
* jointree preprocessing, but the necessary information isn't available
|
|
|
|
* until we've built baserel data structures and classified qual clauses.
|
|
|
|
*/
|
|
|
|
joinlist = remove_useless_joins(root, joinlist);
|
|
|
|
|
2017-05-01 20:53:42 +02:00
|
|
|
/*
|
|
|
|
* Also, reduce any semijoins with unique inner rels to plain inner joins.
|
|
|
|
* Likewise, this can't be done until now for lack of needed info.
|
|
|
|
*/
|
|
|
|
reduce_unique_semijoins(root);
|
|
|
|
|
2010-03-29 00:59:34 +02:00
|
|
|
/*
|
|
|
|
* Now distribute "placeholders" to base rels as needed. This has to be
|
|
|
|
* done after join removal because removal could change whether a
|
2017-02-06 10:33:58 +01:00
|
|
|
* placeholder is evaluable at a base rel.
|
2010-03-29 00:59:34 +02:00
|
|
|
*/
|
|
|
|
add_placeholders_to_base_rels(root);
|
|
|
|
|
2013-08-18 02:22:37 +02:00
|
|
|
/*
|
2015-12-11 21:52:16 +01:00
|
|
|
* Construct the lateral reference sets now that we have finalized
|
|
|
|
* PlaceHolderVar eval levels.
|
2013-08-18 02:22:37 +02:00
|
|
|
*/
|
|
|
|
create_lateral_join_info(root);
|
|
|
|
|
2016-06-18 21:22:34 +02:00
|
|
|
/*
|
|
|
|
* Match foreign keys to equivalence classes and join quals. This must be
|
|
|
|
* done after finalizing equivalence classes, and it's useful to wait till
|
|
|
|
* after join removal so that we can skip processing foreign keys
|
|
|
|
* involving removed relations.
|
|
|
|
*/
|
|
|
|
match_foreign_keys_to_quals(root);
|
|
|
|
|
Extract restriction OR clauses whether or not they are indexable.
It's possible to extract a restriction OR clause from a join clause that
has the form of an OR-of-ANDs, if each sub-AND includes a clause that
mentions only one specific relation. While PG has been aware of that idea
for many years, the code previously only did it if it could extract an
indexable OR clause. On reflection, though, that seems a silly limitation:
adding a restriction clause can be a win by reducing the number of rows
that have to be filtered at the join step, even if we have to test the
clause as a plain filter clause during the scan. This should be especially
useful for foreign tables, where the change can cut the number of rows that
have to be retrieved from the foreign server; but testing shows it can win
even on local tables. Per a suggestion from Robert Haas.
As a heuristic, I made the code accept an extracted restriction clause
if its estimated selectivity is less than 0.9, which will probably result
in accepting extracted clauses just about always. We might need to tweak
that later based on experience.
Since the code no longer has even a weak connection to Path creation,
remove orindxpath.c and create a new file optimizer/util/orclauses.c.
There's some additional janitorial cleanup of now-dead code that needs
to happen, but it seems like that's a fit subject for a separate commit.
2013-12-30 18:24:37 +01:00
|
|
|
/*
|
|
|
|
* Look for join OR clauses that we can extract single-relation
|
|
|
|
* restriction OR clauses from.
|
|
|
|
*/
|
|
|
|
extract_restriction_or_clauses(root);
|
|
|
|
|
2000-02-15 21:49:31 +01:00
|
|
|
/*
|
|
|
|
* Ready to do the primary planning.
|
|
|
|
*/
|
2005-12-20 03:30:36 +01:00
|
|
|
final_rel = make_one_rel(root, joinlist);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
/* Check that we got at least one usable path */
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
if (!final_rel || !final_rel->cheapest_total_path ||
|
|
|
|
final_rel->cheapest_total_path->param_info != NULL)
|
2003-07-25 02:01:09 +02:00
|
|
|
elog(ERROR, "failed to construct the join relation");
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
return final_rel;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|