2010-09-20 22:08:53 +02:00
|
|
|
src/backend/optimizer/README
|
2008-03-20 18:55:15 +01:00
|
|
|
|
|
|
|
Optimizer
|
2008-03-21 14:23:29 +01:00
|
|
|
=========
|
1999-02-09 04:51:42 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
These directories take the Query structure returned by the parser, and
|
|
|
|
generate a plan used by the executor. The /plan directory generates the
|
|
|
|
actual output plan, the /path code generates all possible ways to join the
|
2000-11-12 01:37:02 +01:00
|
|
|
tables, and /prep handles various preprocessing steps for special cases.
|
|
|
|
/util is utility stuff. /geqo is the separate "genetic optimization" planner
|
|
|
|
--- it does a semi-random search through the join tree space, rather than
|
|
|
|
exhaustively considering all possible join trees. (But each join considered
|
|
|
|
by /geqo is given to /path to create paths for, so we consider all possible
|
2000-09-29 20:21:41 +02:00
|
|
|
implementation paths for each specific join pair even in GEQO mode.)
|
|
|
|
|
|
|
|
|
|
|
|
Paths and Join Pairs
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
During the planning/optimizing process, we build "Path" trees representing
|
|
|
|
the different ways of doing a query. We select the cheapest Path that
|
|
|
|
generates the desired relation and turn it into a Plan to pass to the
|
|
|
|
executor. (There is pretty much a one-to-one correspondence between the
|
|
|
|
Path and Plan trees, but Path nodes omit info that won't be needed during
|
|
|
|
planning, and include info needed for planning that won't be needed by the
|
|
|
|
executor.)
|
|
|
|
|
|
|
|
The optimizer builds a RelOptInfo structure for each base relation used in
|
|
|
|
the query. Base rels are either primitive tables, or subquery subselects
|
|
|
|
that are planned via a separate recursive invocation of the planner. A
|
|
|
|
RelOptInfo is also built for each join relation that is considered during
|
|
|
|
planning. A join rel is simply a combination of base rels. There is only
|
|
|
|
one join RelOptInfo for any given set of baserels --- for example, the join
|
|
|
|
{A B C} is represented by the same RelOptInfo no matter whether we build it
|
|
|
|
by joining A and B first and then adding C, or joining B and C first and
|
|
|
|
then adding A, etc. These different means of building the joinrel are
|
|
|
|
represented as Paths. For each RelOptInfo we build a list of Paths that
|
|
|
|
represent plausible ways to implement the scan or join of that relation.
|
|
|
|
Once we've considered all the plausible Paths for a rel, we select the one
|
|
|
|
that is cheapest according to the planner's cost estimates. The final plan
|
|
|
|
is derived from the cheapest Path for the RelOptInfo that includes all the
|
|
|
|
base rels of the query.
|
|
|
|
|
|
|
|
Possible Paths for a primitive table relation include plain old sequential
|
2005-12-20 03:30:36 +01:00
|
|
|
scan, plus index scans for any indexes that exist on the table, plus bitmap
|
|
|
|
index scans using one or more indexes. A subquery base relation just has
|
|
|
|
one Path, a "SubqueryScan" path (which links to the subplan that was built
|
|
|
|
by a recursive invocation of the planner). Likewise a function-RTE base
|
|
|
|
relation has only one possible Path.
|
2000-09-29 20:21:41 +02:00
|
|
|
|
|
|
|
Joins always occur using two RelOptInfos. One is outer, the other inner.
|
|
|
|
Outers drive lookups of values in the inner. In a nested loop, lookups of
|
|
|
|
values in the inner occur by scanning the inner path once per outer tuple
|
|
|
|
to find each matching inner row. In a mergejoin, inner and outer rows are
|
|
|
|
ordered, and are accessed in order, so only one scan is required to perform
|
|
|
|
the entire join: both inner and outer paths are scanned in-sync. (There's
|
|
|
|
not a lot of difference between inner and outer in a mergejoin...) In a
|
|
|
|
hashjoin, the inner is scanned first and all its rows are entered in a
|
|
|
|
hashtable, then the outer is scanned and for each row we lookup the join
|
|
|
|
key in the hashtable.
|
|
|
|
|
|
|
|
A Path for a join relation is actually a tree structure, with the top
|
|
|
|
Path node representing the join method. It has left and right subpaths
|
|
|
|
that represent the scan or join methods used for the two input relations.
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
|
|
|
|
Join Tree Construction
|
|
|
|
----------------------
|
|
|
|
|
1999-08-16 04:17:58 +02:00
|
|
|
The optimizer generates optimal query plans by doing a more-or-less
|
2000-09-29 20:21:41 +02:00
|
|
|
exhaustive search through the ways of executing the query. The best Path
|
|
|
|
tree is found by a recursive process:
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
1) Take each base relation in the query, and make a RelOptInfo structure
|
|
|
|
for it. Find each potentially useful way of accessing the relation,
|
2008-04-09 03:00:46 +02:00
|
|
|
including sequential and index scans, and make Paths representing those
|
|
|
|
ways. All the Paths made for a given relation are placed in its
|
1999-08-16 04:17:58 +02:00
|
|
|
RelOptInfo.pathlist. (Actually, we discard Paths that are obviously
|
|
|
|
inferior alternatives before they ever get into the pathlist --- what
|
|
|
|
ends up in the pathlist is the cheapest way of generating each potentially
|
2005-06-09 06:19:00 +02:00
|
|
|
useful sort ordering of the relation.) Also create a RelOptInfo.joininfo
|
|
|
|
list including all the join clauses that involve this relation. For
|
|
|
|
example, the WHERE clause "tab1.col1 = tab2.col1" generates entries in
|
|
|
|
both tab1 and tab2's joininfo lists.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
2001-01-17 07:41:31 +01:00
|
|
|
If we have only a single base relation in the query, we are done.
|
2000-02-07 05:41:04 +01:00
|
|
|
Otherwise we have to figure out how to join the base relations into a
|
|
|
|
single join relation.
|
|
|
|
|
2005-12-20 03:30:36 +01:00
|
|
|
2) Normally, any explicit JOIN clauses are "flattened" so that we just
|
|
|
|
have a list of relations to join. However, FULL OUTER JOIN clauses are
|
|
|
|
never flattened, and other kinds of JOIN might not be either, if the
|
|
|
|
flattening process is stopped by join_collapse_limit or from_collapse_limit
|
|
|
|
restrictions. Therefore, we end up with a planning problem that contains
|
2007-01-20 21:45:41 +01:00
|
|
|
lists of relations to be joined in any order, where any individual item
|
|
|
|
might be a sub-list that has to be joined together before we can consider
|
|
|
|
joining it to its siblings. We process these sub-problems recursively,
|
|
|
|
bottom up. Note that the join list structure constrains the possible join
|
|
|
|
orders, but it doesn't constrain the join implementation method at each
|
|
|
|
join (nestloop, merge, hash), nor does it say which rel is considered outer
|
|
|
|
or inner at each join. We consider all these possibilities in building
|
|
|
|
Paths. We generate a Path for each feasible join method, and select the
|
|
|
|
cheapest Path.
|
|
|
|
|
|
|
|
For each planning problem, therefore, we will have a list of relations
|
|
|
|
that are either base rels or joinrels constructed per sub-join-lists.
|
|
|
|
We can join these rels together in any order the planner sees fit.
|
2000-09-29 20:21:41 +02:00
|
|
|
The standard (non-GEQO) planner does this as follows:
|
|
|
|
|
Restructure code that is responsible for ensuring that clauseless joins are
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct). The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report. His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there. (See the example case added to the join
regression test.) On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was. There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not. It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction. I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.
Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
2007-02-16 01:14:01 +01:00
|
|
|
Consider joining each RelOptInfo to each other RelOptInfo for which there
|
|
|
|
is a usable joinclause, and generate a Path for each possible join method
|
|
|
|
for each such pair. (If we have a RelOptInfo with no join clauses, we have
|
|
|
|
no choice but to generate a clauseless Cartesian-product join; so we
|
|
|
|
consider joining that rel to each other available rel. But in the presence
|
|
|
|
of join clauses we will only consider joins that use available join
|
|
|
|
clauses. Note that join-order restrictions induced by outer joins and
|
2008-08-14 20:48:00 +02:00
|
|
|
IN/EXISTS clauses are also checked, to ensure that we find a workable join
|
|
|
|
order in cases where those restrictions force a clauseless join to be done.)
|
2000-09-29 20:21:41 +02:00
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
If we only had two relations in the list, we are done: we just pick
|
2000-09-29 20:21:41 +02:00
|
|
|
the cheapest path for the join RelOptInfo. If we had more than two, we now
|
1999-08-16 04:17:58 +02:00
|
|
|
need to consider ways of joining join RelOptInfos to each other to make
|
2007-01-20 21:45:41 +01:00
|
|
|
join RelOptInfos that represent more than two list items.
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
The join tree is constructed using a "dynamic programming" algorithm:
|
|
|
|
in the first pass (already described) we consider ways to create join rels
|
2007-01-20 21:45:41 +01:00
|
|
|
representing exactly two list items. The second pass considers ways
|
|
|
|
to make join rels that represent exactly three list items; the next pass,
|
2000-09-29 20:21:41 +02:00
|
|
|
four items, etc. The last pass considers how to make the final join
|
2007-01-20 21:45:41 +01:00
|
|
|
relation that includes all list items --- obviously there can be only one
|
2000-02-07 05:41:04 +01:00
|
|
|
join rel at this top level, whereas there can be more than one join rel
|
|
|
|
at lower levels. At each level we use joins that follow available join
|
|
|
|
clauses, if possible, just as described for the first level.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
For example:
|
1999-02-09 04:51:42 +01:00
|
|
|
|
1999-02-15 23:19:01 +01:00
|
|
|
SELECT *
|
|
|
|
FROM tab1, tab2, tab3, tab4
|
|
|
|
WHERE tab1.col = tab2.col AND
|
|
|
|
tab2.col = tab3.col AND
|
|
|
|
tab3.col = tab4.col
|
|
|
|
|
|
|
|
Tables 1, 2, 3, and 4 are joined as:
|
|
|
|
{1 2},{2 3},{3 4}
|
|
|
|
{1 2 3},{2 3 4}
|
|
|
|
{1 2 3 4}
|
2000-02-07 05:41:04 +01:00
|
|
|
(other possibilities will be excluded for lack of join clauses)
|
1999-02-15 23:19:01 +01:00
|
|
|
|
|
|
|
SELECT *
|
|
|
|
FROM tab1, tab2, tab3, tab4
|
|
|
|
WHERE tab1.col = tab2.col AND
|
|
|
|
tab1.col = tab3.col AND
|
|
|
|
tab1.col = tab4.col
|
|
|
|
|
|
|
|
Tables 1, 2, 3, and 4 are joined as:
|
|
|
|
{1 2},{1 3},{1 4}
|
2000-02-07 05:41:04 +01:00
|
|
|
{1 2 3},{1 3 4},{1 2 4}
|
1999-02-15 23:19:01 +01:00
|
|
|
{1 2 3 4}
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
We consider left-handed plans (the outer rel of an upper join is a joinrel,
|
2007-01-20 21:45:41 +01:00
|
|
|
but the inner is always a single list item); right-handed plans (outer rel
|
2000-09-29 20:21:41 +02:00
|
|
|
is always a single item); and bushy plans (both inner and outer can be
|
|
|
|
joins themselves). For example, when building {1 2 3 4} we consider
|
|
|
|
joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
|
|
|
|
{1 2} to {3 4} (bushy), among other choices. Although the jointree
|
|
|
|
scanning code produces these potential join combinations one at a time,
|
|
|
|
all the ways to produce the same set of joined base rels will share the
|
|
|
|
same RelOptInfo, so the paths produced from different join combinations
|
2005-12-20 03:30:36 +01:00
|
|
|
that produce equivalent joinrels will compete in add_path().
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
Once we have built the final join rel, we use either the cheapest path
|
|
|
|
for it or the cheapest path with the desired ordering (if that's cheaper
|
|
|
|
than applying a sort to the cheapest other path).
|
|
|
|
|
2005-12-20 03:30:36 +01:00
|
|
|
If the query contains one-sided outer joins (LEFT or RIGHT joins), or
|
2008-08-14 20:48:00 +02:00
|
|
|
IN or EXISTS WHERE clauses that were converted to joins, then some of
|
2005-12-20 03:30:36 +01:00
|
|
|
the possible join orders may be illegal. These are excluded by having
|
2008-08-14 20:48:00 +02:00
|
|
|
join_is_legal consult a side list of such "special" joins to see
|
2005-12-20 03:30:36 +01:00
|
|
|
whether a proposed join is illegal. (The same consultation allows it
|
|
|
|
to see which join style should be applied for a valid join, ie,
|
|
|
|
JOIN_INNER, JOIN_LEFT, etc.)
|
|
|
|
|
|
|
|
|
2008-03-20 18:55:15 +01:00
|
|
|
Valid OUTER JOIN Optimizations
|
2005-12-20 03:30:36 +01:00
|
|
|
------------------------------
|
|
|
|
|
|
|
|
The planner's treatment of outer join reordering is based on the following
|
|
|
|
identities:
|
|
|
|
|
|
|
|
1. (A leftjoin B on (Pab)) innerjoin C on (Pac)
|
|
|
|
= (A innerjoin C on (Pac)) leftjoin B on (Pab)
|
|
|
|
|
|
|
|
where Pac is a predicate referencing A and C, etc (in this case, clearly
|
|
|
|
Pac cannot reference B, or the transformation is nonsensical).
|
|
|
|
|
|
|
|
2. (A leftjoin B on (Pab)) leftjoin C on (Pac)
|
|
|
|
= (A leftjoin C on (Pac)) leftjoin B on (Pab)
|
|
|
|
|
|
|
|
3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
|
|
|
|
= A leftjoin (B leftjoin C on (Pbc)) on (Pab)
|
|
|
|
|
|
|
|
Identity 3 only holds if predicate Pbc must fail for all-null B rows
|
|
|
|
(that is, Pbc is strict for at least one column of B). If Pbc is not
|
|
|
|
strict, the first form might produce some rows with nonnull C columns
|
|
|
|
where the second form would make those entries null.
|
|
|
|
|
|
|
|
RIGHT JOIN is equivalent to LEFT JOIN after switching the two input
|
2009-02-27 23:41:38 +01:00
|
|
|
tables, so the same identities work for right joins.
|
2005-12-20 03:30:36 +01:00
|
|
|
|
|
|
|
An example of a case that does *not* work is moving an innerjoin into or
|
|
|
|
out of the nullable side of an outer join:
|
|
|
|
|
|
|
|
A leftjoin (B join C on (Pbc)) on (Pab)
|
|
|
|
!= (A leftjoin B on (Pab)) join C on (Pbc)
|
|
|
|
|
2009-02-27 23:41:38 +01:00
|
|
|
SEMI joins work a little bit differently. A semijoin can be reassociated
|
2009-07-21 04:02:44 +02:00
|
|
|
into or out of the lefthand side of another semijoin, left join, or
|
|
|
|
antijoin, but not into or out of the righthand side. Likewise, an inner
|
|
|
|
join, left join, or antijoin can be reassociated into or out of the
|
|
|
|
lefthand side of a semijoin, but not into or out of the righthand side.
|
2009-02-27 23:41:38 +01:00
|
|
|
|
|
|
|
ANTI joins work approximately like LEFT joins, except that identity 3
|
|
|
|
fails if the join to C is an antijoin (even if Pbc is strict, and in
|
|
|
|
both the cases where the other join is a leftjoin and where it is an
|
|
|
|
antijoin). So we can't reorder antijoins into or out of the RHS of a
|
|
|
|
leftjoin or antijoin, even if the relevant clause is strict.
|
|
|
|
|
|
|
|
The current code does not attempt to re-order FULL JOINs at all.
|
2005-12-20 03:30:36 +01:00
|
|
|
FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when
|
2009-02-27 23:41:38 +01:00
|
|
|
translating the jointree to "joinlist" representation. Other types of
|
2005-12-20 03:30:36 +01:00
|
|
|
JOIN nodes are normally collapsed so that they participate fully in the
|
|
|
|
join order search. To avoid generating illegal join orders, the planner
|
2009-02-27 23:41:38 +01:00
|
|
|
creates a SpecialJoinInfo node for each non-inner join, and join_is_legal
|
2005-12-20 03:30:36 +01:00
|
|
|
checks this list to decide if a proposed join is legal.
|
|
|
|
|
2008-08-14 20:48:00 +02:00
|
|
|
What we store in SpecialJoinInfo nodes are the minimum sets of Relids
|
2005-12-20 03:30:36 +01:00
|
|
|
required on each side of the join to form the outer join. Note that
|
|
|
|
these are minimums; there's no explicit maximum, since joining other
|
|
|
|
rels to the OJ's syntactic rels may be legal. Per identities 1 and 2,
|
|
|
|
non-FULL joins can be freely associated into the lefthand side of an
|
2009-02-27 23:41:38 +01:00
|
|
|
OJ, but in some cases they can't be associated into the righthand side.
|
2007-10-26 20:10:50 +02:00
|
|
|
So the restriction enforced by join_is_legal is that a proposed join
|
2007-02-13 03:31:03 +01:00
|
|
|
can't join a rel within or partly within an RHS boundary to one outside
|
|
|
|
the boundary, unless the join validly implements some outer join.
|
2005-12-20 03:30:36 +01:00
|
|
|
(To support use of identity 3, we have to allow cases where an apparent
|
|
|
|
violation of a lower OJ's RHS is committed while forming an upper OJ.
|
|
|
|
If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS
|
|
|
|
set must be expanded to include the whole of the lower OJ, thereby
|
|
|
|
preventing it from being formed before the lower OJ is.)
|
|
|
|
|
2000-09-29 20:21:41 +02:00
|
|
|
|
2008-03-20 18:55:15 +01:00
|
|
|
Pulling Up Subqueries
|
2000-09-29 20:21:41 +02:00
|
|
|
---------------------
|
|
|
|
|
|
|
|
As we described above, a subquery appearing in the range table is planned
|
|
|
|
independently and treated as a "black box" during planning of the outer
|
|
|
|
query. This is necessary when the subquery uses features such as
|
|
|
|
aggregates, GROUP, or DISTINCT. But if the subquery is just a simple
|
|
|
|
scan or join, treating the subquery as a black box may produce a poor plan
|
|
|
|
compared to considering it as part of the entire plan search space.
|
|
|
|
Therefore, at the start of the planning process the planner looks for
|
|
|
|
simple subqueries and pulls them up into the main query's jointree.
|
|
|
|
|
|
|
|
Pulling up a subquery may result in FROM-list joins appearing below the top
|
|
|
|
of the join tree. Each FROM-list is planned using the dynamic-programming
|
|
|
|
search method described above.
|
|
|
|
|
|
|
|
If pulling up a subquery produces a FROM-list as a direct child of another
|
2005-12-20 03:30:36 +01:00
|
|
|
FROM-list, then we can merge the two FROM-lists together. Once that's
|
|
|
|
done, the subquery is an absolutely integral part of the outer query and
|
|
|
|
will not constrain the join tree search space at all. However, that could
|
|
|
|
result in unpleasant growth of planning time, since the dynamic-programming
|
|
|
|
search has runtime exponential in the number of FROM-items considered.
|
|
|
|
Therefore, we don't merge FROM-lists if the result would have too many
|
|
|
|
FROM-items in one list.
|
2000-09-12 23:07:18 +02:00
|
|
|
|
1999-02-08 05:29:25 +01:00
|
|
|
|
1999-02-04 04:19:11 +01:00
|
|
|
Optimizer Functions
|
|
|
|
-------------------
|
|
|
|
|
2000-03-21 06:12:12 +01:00
|
|
|
The primary entry point is planner().
|
|
|
|
|
1997-12-17 19:02:33 +01:00
|
|
|
planner()
|
2000-03-21 06:12:12 +01:00
|
|
|
set up for recursive handling of subqueries
|
2008-04-09 03:00:46 +02:00
|
|
|
do final cleanup after planning
|
2000-03-21 06:12:12 +01:00
|
|
|
-subquery_planner()
|
2008-08-14 20:48:00 +02:00
|
|
|
pull up sublinks and subqueries from rangetable, if possible
|
2000-03-21 06:12:12 +01:00
|
|
|
canonicalize qual
|
2003-12-30 22:49:19 +01:00
|
|
|
Attempt to simplify WHERE clause to the most useful form; this includes
|
|
|
|
flattening nested AND/ORs and detecting clauses that are duplicated in
|
|
|
|
different branches of an OR.
|
|
|
|
simplify constant expressions
|
2000-03-21 06:12:12 +01:00
|
|
|
process sublinks
|
|
|
|
convert Vars of outer query levels into Params
|
2000-11-12 01:37:02 +01:00
|
|
|
--grouping_planner()
|
|
|
|
preprocess target list for non-SELECT queries
|
|
|
|
handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
|
|
|
|
ORDER BY, DISTINCT, LIMIT
|
2000-03-21 06:12:12 +01:00
|
|
|
--query_planner()
|
2000-09-29 20:21:41 +02:00
|
|
|
pull out constant quals, which can be used to gate execution of the
|
|
|
|
whole plan (if any are found, we make a top-level Result node
|
|
|
|
to do the gating)
|
2002-11-06 01:00:45 +01:00
|
|
|
make list of base relations used in query
|
|
|
|
split up the qual into restrictions (a=1) and joins (b=c)
|
|
|
|
find qual clauses that enable merge and hash joins
|
1999-02-15 23:19:01 +01:00
|
|
|
----make_one_rel()
|
|
|
|
set_base_rel_pathlist()
|
2007-09-26 20:51:51 +02:00
|
|
|
find seqscan and all index paths for each base relation
|
1999-02-15 23:19:01 +01:00
|
|
|
find selectivity of columns used in joins
|
2007-09-26 20:51:51 +02:00
|
|
|
make_rel_from_joinlist()
|
|
|
|
hand off join subproblems to a plugin, GEQO, or standard_join_search()
|
|
|
|
-----standard_join_search()
|
|
|
|
call join_search_one_level() for each level of join tree needed
|
|
|
|
join_search_one_level():
|
2000-02-07 05:41:04 +01:00
|
|
|
For each joinrel of the prior level, do make_rels_by_clause_joins()
|
|
|
|
if it has join clauses, or make_rels_by_clauseless_joins() if not.
|
|
|
|
Also generate "bushy plan" joins between joinrels of lower levels.
|
2007-09-26 20:51:51 +02:00
|
|
|
Back at standard_join_search(), apply set_cheapest() to extract the
|
2000-02-07 05:41:04 +01:00
|
|
|
cheapest path for each newly constructed joinrel.
|
|
|
|
Loop back if this wasn't the top join level.
|
2000-09-29 20:21:41 +02:00
|
|
|
Back at query_planner:
|
2002-11-06 01:00:45 +01:00
|
|
|
put back any constant quals by adding a Result node
|
2000-11-12 01:37:02 +01:00
|
|
|
Back at grouping_planner:
|
2000-09-29 20:21:41 +02:00
|
|
|
do grouping(GROUP)
|
|
|
|
do aggregates
|
1997-12-17 19:02:33 +01:00
|
|
|
make unique(DISTINCT)
|
|
|
|
make sort(ORDER BY)
|
2000-11-12 01:37:02 +01:00
|
|
|
make limit(LIMIT/OFFSET)
|
1999-02-03 21:15:53 +01:00
|
|
|
|
|
|
|
|
1999-08-16 04:17:58 +02:00
|
|
|
Optimizer Data Structures
|
|
|
|
-------------------------
|
1999-02-04 04:19:11 +01:00
|
|
|
|
2007-02-19 08:03:34 +01:00
|
|
|
PlannerGlobal - global information for a single planner invocation
|
|
|
|
|
|
|
|
PlannerInfo - information for planning a particular Query (we make
|
|
|
|
a separate PlannerInfo node for each sub-Query)
|
2005-06-06 00:32:58 +02:00
|
|
|
|
1999-02-15 23:19:01 +01:00
|
|
|
RelOptInfo - a relation or joined relations
|
1999-02-04 04:19:11 +01:00
|
|
|
|
2003-01-15 20:35:48 +01:00
|
|
|
RestrictInfo - WHERE clauses, like "x = 3" or "y = z"
|
|
|
|
(note the same structure is used for restriction and
|
|
|
|
join clauses)
|
1999-02-04 04:19:11 +01:00
|
|
|
|
1999-02-15 23:19:01 +01:00
|
|
|
Path - every way to generate a RelOptInfo(sequential,index,joins)
|
2010-10-14 22:56:39 +02:00
|
|
|
SeqScan - represents a sequential scan plan
|
|
|
|
IndexPath - index scan
|
2005-04-21 21:18:13 +02:00
|
|
|
BitmapHeapPath - top of a bitmapped index scan
|
2002-11-06 01:00:45 +01:00
|
|
|
TidPath - scan by CTID
|
|
|
|
AppendPath - append multiple subpaths together
|
2010-10-14 22:56:39 +02:00
|
|
|
MergeAppendPath - merge multiple subpaths, preserving their common sort order
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
ResultPath - a Result plan node (used for FROM-less SELECT)
|
2002-11-30 06:21:03 +01:00
|
|
|
MaterialPath - a Material plan node
|
2003-01-20 19:55:07 +01:00
|
|
|
UniquePath - remove duplicate rows
|
1999-08-16 04:17:58 +02:00
|
|
|
NestPath - nested-loop joins
|
1999-02-15 23:19:01 +01:00
|
|
|
MergePath - merge joins
|
|
|
|
HashPath - hash joins
|
1999-02-04 02:47:02 +01:00
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
EquivalenceClass - a data structure representing a set of values known equal
|
|
|
|
|
|
|
|
PathKey - a data structure representing the sort ordering of a path
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
The optimizer spends a good deal of its time worrying about the ordering
|
|
|
|
of the tuples returned by a path. The reason this is useful is that by
|
|
|
|
knowing the sort ordering of a path, we may be able to use that path as
|
|
|
|
the left or right input of a mergejoin and avoid an explicit sort step.
|
|
|
|
Nestloops and hash joins don't really care what the order of their inputs
|
|
|
|
is, but mergejoin needs suitably ordered inputs. Therefore, all paths
|
|
|
|
generated during the optimization process are marked with their sort order
|
|
|
|
(to the extent that it is known) for possible use by a higher-level merge.
|
|
|
|
|
|
|
|
It is also possible to avoid an explicit sort step to implement a user's
|
2000-02-07 05:41:04 +01:00
|
|
|
ORDER BY clause if the final path has the right ordering already, so the
|
2002-11-06 01:00:45 +01:00
|
|
|
sort ordering is of interest even at the top level. query_planner() will
|
2000-02-07 05:41:04 +01:00
|
|
|
look for the cheapest path with a sort order matching the desired order,
|
2002-11-06 01:00:45 +01:00
|
|
|
and grouping_planner() will compare its cost to the cost of using the
|
|
|
|
cheapest-overall path and doing an explicit sort.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
When we are generating paths for a particular RelOptInfo, we discard a path
|
|
|
|
if it is more expensive than another known path that has the same or better
|
|
|
|
sort order. We will never discard a path that is the only known way to
|
2000-02-07 05:41:04 +01:00
|
|
|
achieve a given sort order (without an explicit sort, that is). In this
|
|
|
|
way, the next level up will have the maximum freedom to build mergejoins
|
|
|
|
without sorting, since it can pick from any of the paths retained for its
|
|
|
|
inputs.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
EquivalenceClasses
|
|
|
|
------------------
|
|
|
|
|
|
|
|
During the deconstruct_jointree() scan of the query's qual clauses, we look
|
|
|
|
for mergejoinable equality clauses A = B whose applicability is not delayed
|
|
|
|
by an outer join; these are called "equivalence clauses". When we find
|
|
|
|
one, we create an EquivalenceClass containing the expressions A and B to
|
|
|
|
record this knowledge. If we later find another equivalence clause B = C,
|
|
|
|
we add C to the existing EquivalenceClass for {A B}; this may require
|
|
|
|
merging two existing EquivalenceClasses. At the end of the scan, we have
|
|
|
|
sets of values that are known all transitively equal to each other. We can
|
|
|
|
therefore use a comparison of any pair of the values as a restriction or
|
|
|
|
join clause (when these values are available at the scan or join, of
|
|
|
|
course); furthermore, we need test only one such comparison, not all of
|
|
|
|
them. Therefore, equivalence clauses are removed from the standard qual
|
|
|
|
distribution process. Instead, when preparing a restriction or join clause
|
|
|
|
list, we examine each EquivalenceClass to see if it can contribute a
|
|
|
|
clause, and if so we select an appropriate pair of values to compare. For
|
|
|
|
example, if we are trying to join A's relation to C's, we can generate the
|
|
|
|
clause A = C, even though this appeared nowhere explicitly in the original
|
|
|
|
query. This may allow us to explore join paths that otherwise would have
|
|
|
|
been rejected as requiring Cartesian-product joins.
|
|
|
|
|
|
|
|
Sometimes an EquivalenceClass may contain a pseudo-constant expression
|
|
|
|
(i.e., one not containing Vars or Aggs of the current query level, nor
|
|
|
|
volatile functions). In this case we do not follow the policy of
|
|
|
|
dynamically generating join clauses: instead, we dynamically generate
|
|
|
|
restriction clauses "var = const" wherever one of the variable members of
|
|
|
|
the class can first be computed. For example, if we have A = B and B = 42,
|
|
|
|
we effectively generate the restriction clauses A = 42 and B = 42, and then
|
|
|
|
we need not bother with explicitly testing the join clause A = B when the
|
|
|
|
relations are joined. In effect, all the class members can be tested at
|
|
|
|
relation-scan level and there's never a need for join tests.
|
|
|
|
|
|
|
|
The precise technical interpretation of an EquivalenceClass is that it
|
|
|
|
asserts that at any plan node where more than one of its member values
|
|
|
|
can be computed, output rows in which the values are not all equal may
|
|
|
|
be discarded without affecting the query result. (We require all levels
|
|
|
|
of the plan to enforce EquivalenceClasses, hence a join need not recheck
|
|
|
|
equality of values that were computable by one of its children.) For an
|
|
|
|
ordinary EquivalenceClass that is "valid everywhere", we can further infer
|
|
|
|
that the values are all non-null, because all mergejoinable operators are
|
|
|
|
strict. However, we also allow equivalence clauses that appear below the
|
|
|
|
nullable side of an outer join to form EquivalenceClasses; for these
|
|
|
|
classes, the interpretation is that either all the values are equal, or
|
|
|
|
all (except pseudo-constants) have gone to null. (This requires a
|
|
|
|
limitation that non-constant members be strict, else they might not go
|
|
|
|
to null when the other members do.) Consider for example
|
|
|
|
|
|
|
|
SELECT *
|
|
|
|
FROM a LEFT JOIN
|
|
|
|
(SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
|
|
|
|
ON a.x = ss.y
|
|
|
|
WHERE a.x = 42;
|
|
|
|
|
|
|
|
We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
|
|
|
|
apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed
|
|
|
|
clauses from forming EquivalenceClasses is exactly that we want to be able
|
|
|
|
to push any derived clauses as far down as possible.) But once above the
|
|
|
|
outer join it's no longer necessarily the case that b.y = 10, and thus we
|
|
|
|
cannot use such EquivalenceClasses to conclude that sorting is unnecessary
|
|
|
|
(see discussion of PathKeys below).
|
|
|
|
|
|
|
|
In this example, notice also that a.x = ss.y (really a.x = b.y) is not an
|
|
|
|
equivalence clause because its applicability to b is delayed by the outer
|
|
|
|
join; thus we do not try to insert b.y into the equivalence class {a.x 42}.
|
|
|
|
But since we see that a.x has been equated to 42 above the outer join, we
|
|
|
|
are able to form a below-outer-join class {b.y 42}; this restriction can be
|
|
|
|
added because no b/c row not having b.y = 42 can contribute to the result
|
|
|
|
of the outer join, and so we need not compute such rows. Now this class
|
|
|
|
will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42,
|
|
|
|
which lets the planner deduce that the b/c join need not be computed at all
|
|
|
|
because none of its rows can contribute to the outer join. (This gets
|
|
|
|
implemented as a gating Result filter, since more usually the potential
|
|
|
|
contradiction involves Param values rather than just Consts, and thus has
|
|
|
|
to be checked at runtime.)
|
|
|
|
|
|
|
|
To aid in determining the sort ordering(s) that can work with a mergejoin,
|
|
|
|
we mark each mergejoinable clause with the EquivalenceClasses of its left
|
|
|
|
and right inputs. For an equivalence clause, these are of course the same
|
|
|
|
EquivalenceClass. For a non-equivalence mergejoinable clause (such as an
|
|
|
|
outer-join qualification), we generate two separate EquivalenceClasses for
|
|
|
|
the left and right inputs. This may result in creating single-item
|
|
|
|
equivalence "classes", though of course these are still subject to merging
|
|
|
|
if other equivalence clauses are later found to bear on the same
|
|
|
|
expressions.
|
|
|
|
|
|
|
|
Another way that we may form a single-item EquivalenceClass is in creation
|
|
|
|
of a PathKey to represent a desired sort order (see below). This is a bit
|
|
|
|
different from the above cases because such an EquivalenceClass might
|
|
|
|
contain an aggregate function or volatile expression. (A clause containing
|
|
|
|
a volatile function will never be considered mergejoinable, even if its top
|
|
|
|
operator is mergejoinable, so there is no way for a volatile expression to
|
|
|
|
get into EquivalenceClasses otherwise. Aggregates are disallowed in WHERE
|
|
|
|
altogether, so will never be found in a mergejoinable clause.) This is just
|
|
|
|
a convenience to maintain a uniform PathKey representation: such an
|
2009-09-29 03:20:34 +02:00
|
|
|
EquivalenceClass will never be merged with any other. Note in particular
|
|
|
|
that a single-item EquivalenceClass {a.x} is *not* meant to imply an
|
|
|
|
assertion that a.x = a.x; the practical effect of this is that a.x could
|
|
|
|
be NULL.
|
2007-01-20 21:45:41 +01:00
|
|
|
|
|
|
|
An EquivalenceClass also contains a list of btree opfamily OIDs, which
|
|
|
|
determines what the equalities it represents actually "mean". All the
|
|
|
|
equivalence clauses that contribute to an EquivalenceClass must have
|
|
|
|
equality operators that belong to the same set of opfamilies. (Note: most
|
|
|
|
of the time, a particular equality operator belongs to only one family, but
|
|
|
|
it's possible that it belongs to more than one. We keep track of all the
|
|
|
|
families to ensure that we can make use of an index belonging to any one of
|
|
|
|
the families for mergejoin purposes.)
|
|
|
|
|
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
PathKeys
|
|
|
|
--------
|
|
|
|
|
|
|
|
The PathKeys data structure represents what is known about the sort order
|
2007-01-20 21:45:41 +01:00
|
|
|
of the tuples generated by a particular Path. A path's pathkeys field is a
|
|
|
|
list of PathKey nodes, where the n'th item represents the n'th sort key of
|
|
|
|
the result. Each PathKey contains these fields:
|
2000-07-24 05:11:01 +02:00
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
* a reference to an EquivalenceClass
|
|
|
|
* a btree opfamily OID (must match one of those in the EC)
|
|
|
|
* a sort direction (ascending or descending)
|
|
|
|
* a nulls-first-or-last flag
|
|
|
|
|
|
|
|
The EquivalenceClass represents the value being sorted on. Since the
|
|
|
|
various members of an EquivalenceClass are known equal according to the
|
|
|
|
opfamily, we can consider a path sorted by any one of them to be sorted by
|
|
|
|
any other too; this is what justifies referencing the whole
|
|
|
|
EquivalenceClass rather than just one member of it.
|
2000-07-24 05:11:01 +02:00
|
|
|
|
|
|
|
In single/base relation RelOptInfo's, the Paths represent various ways
|
|
|
|
of scanning the relation and the resulting ordering of the tuples.
|
|
|
|
Sequential scan Paths have NIL pathkeys, indicating no known ordering.
|
|
|
|
Index scans have Path.pathkeys that represent the chosen index's ordering,
|
2007-01-20 21:45:41 +01:00
|
|
|
if any. A single-key index would create a single-PathKey list, while a
|
|
|
|
multi-column index generates a list with one element per index column.
|
|
|
|
(Actually, since an index can be scanned either forward or backward, there
|
|
|
|
are two possible sort orders and two possible PathKey lists it can
|
|
|
|
generate.)
|
|
|
|
|
|
|
|
Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL
|
|
|
|
pathkeys since we can say nothing about the overall order of its result.
|
|
|
|
Also, an indexscan on an unordered type of index generates NIL pathkeys.
|
|
|
|
However, we can always create a pathkey by doing an explicit sort. The
|
|
|
|
pathkeys for a Sort plan's output just represent the sort key fields and
|
|
|
|
the ordering operators used.
|
2000-07-24 05:11:01 +02:00
|
|
|
|
|
|
|
Things get more interesting when we consider joins. Suppose we do a
|
|
|
|
mergejoin between A and B using the mergeclause A.X = B.Y. The output
|
2007-01-20 21:45:41 +01:00
|
|
|
of the mergejoin is sorted by X --- but it is also sorted by Y. Again,
|
|
|
|
this can be represented by a PathKey referencing an EquivalenceClass
|
|
|
|
containing both X and Y.
|
|
|
|
|
|
|
|
With a little further thought, it becomes apparent that nestloop joins
|
|
|
|
can also produce sorted output. For example, if we do a nestloop join
|
|
|
|
between outer relation A and inner relation B, then any pathkeys relevant
|
|
|
|
to A are still valid for the join result: we have not altered the order of
|
|
|
|
the tuples from A. Even more interesting, if there was an equivalence clause
|
|
|
|
A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
|
|
|
|
that B.Y is a pathkey for the join result; X was ordered before and still
|
|
|
|
is, and the joined values of Y are equal to the joined values of X, so Y
|
2000-07-24 05:11:01 +02:00
|
|
|
must now be ordered too. This is true even though we used neither an
|
2007-01-20 21:45:41 +01:00
|
|
|
explicit sort nor a mergejoin on Y. (Note: hash joins cannot be counted
|
|
|
|
on to preserve the order of their outer relation, because the executor
|
|
|
|
might decide to "batch" the join, so we always set pathkeys to NIL for
|
|
|
|
a hashjoin path.) Exception: a RIGHT or FULL join doesn't preserve the
|
|
|
|
ordering of its outer relation, because it might insert nulls at random
|
|
|
|
points in the ordering.
|
|
|
|
|
|
|
|
In general, we can justify using EquivalenceClasses as the basis for
|
|
|
|
pathkeys because, whenever we scan a relation containing multiple
|
|
|
|
EquivalenceClass members or join two relations each containing
|
|
|
|
EquivalenceClass members, we apply restriction or join clauses derived from
|
|
|
|
the EquivalenceClass. This guarantees that any two values listed in the
|
|
|
|
EquivalenceClass are in fact equal in all tuples emitted by the scan or
|
|
|
|
join, and therefore that if the tuples are sorted by one of the values,
|
|
|
|
they can be considered sorted by any other as well. It does not matter
|
|
|
|
whether the test clause is used as a mergeclause, or merely enforced
|
|
|
|
after-the-fact as a qpqual filter.
|
|
|
|
|
|
|
|
Note that there is no particular difficulty in labeling a path's sort
|
|
|
|
order with a PathKey referencing an EquivalenceClass that contains
|
|
|
|
variables not yet joined into the path's output. We can simply ignore
|
|
|
|
such entries as not being relevant (yet). This makes it possible to
|
|
|
|
use the same EquivalenceClasses throughout the join planning process.
|
|
|
|
In fact, by being careful not to generate multiple identical PathKey
|
|
|
|
objects, we can reduce comparison of EquivalenceClasses and PathKeys
|
|
|
|
to simple pointer comparison, which is a huge savings because add_path
|
|
|
|
has to make a large number of PathKey comparisons in deciding whether
|
|
|
|
competing Paths are equivalently sorted.
|
2000-07-24 05:11:01 +02:00
|
|
|
|
|
|
|
Pathkeys are also useful to represent an ordering that we wish to achieve,
|
|
|
|
since they are easily compared to the pathkeys of a potential candidate
|
2008-08-02 23:32:01 +02:00
|
|
|
path. So, SortGroupClause lists are turned into pathkeys lists for use
|
|
|
|
inside the optimizer.
|
2000-07-24 05:11:01 +02:00
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
Because we have to generate pathkeys lists from the sort clauses before
|
|
|
|
we've finished EquivalenceClass merging, we cannot use the pointer-equality
|
|
|
|
method of comparing PathKeys in the earliest stages of the planning
|
|
|
|
process. Instead, we generate "non canonical" PathKeys that reference
|
|
|
|
single-element EquivalenceClasses that might get merged later. After we
|
|
|
|
complete EquivalenceClass merging, we replace these with "canonical"
|
|
|
|
PathKeys that reference only fully-merged classes, and after that we make
|
|
|
|
sure we don't generate more than one copy of each "canonical" PathKey.
|
|
|
|
Then it is safe to use pointer comparison on canonical PathKeys.
|
2000-07-24 05:11:01 +02:00
|
|
|
|
2000-12-14 23:30:45 +01:00
|
|
|
An additional refinement we can make is to insist that canonical pathkey
|
2007-01-20 21:45:41 +01:00
|
|
|
lists (sort orderings) do not mention the same EquivalenceClass more than
|
|
|
|
once. For example, in all these cases the second sort column is redundant,
|
|
|
|
because it cannot distinguish values that are the same according to the
|
|
|
|
first sort column:
|
|
|
|
SELECT ... ORDER BY x, x
|
|
|
|
SELECT ... ORDER BY x, x DESC
|
|
|
|
SELECT ... WHERE x = y ORDER BY x, y
|
|
|
|
Although a user probably wouldn't write "ORDER BY x,x" directly, such
|
|
|
|
redundancies are more probable once equivalence classes have been
|
|
|
|
considered. Also, the system may generate redundant pathkey lists when
|
|
|
|
computing the sort ordering needed for a mergejoin. By eliminating the
|
|
|
|
redundancy, we save time and improve planning, since the planner will more
|
|
|
|
easily recognize equivalent orderings as being equivalent.
|
|
|
|
|
|
|
|
Another interesting property is that if the underlying EquivalenceClass
|
|
|
|
contains a constant and is not below an outer join, then the pathkey is
|
|
|
|
completely redundant and need not be sorted by at all! Every row must
|
|
|
|
contain the same constant value, so there's no need to sort. (If the EC is
|
|
|
|
below an outer join, we still have to sort, since some of the rows might
|
|
|
|
have gone to null and others not. In this case we must be careful to pick
|
|
|
|
a non-const member to sort by. The assumption that all the non-const
|
|
|
|
members go to null at the same plan level is critical here, else they might
|
|
|
|
not produce the same sort order.) This might seem pointless because users
|
|
|
|
are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
|
|
|
|
recognize when particular index columns are irrelevant to the sort order:
|
|
|
|
if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
|
|
|
|
produces correctly ordered data without a sort step. We used to have very
|
|
|
|
ugly ad-hoc code to recognize that in limited contexts, but discarding
|
|
|
|
constant ECs from pathkeys makes it happen cleanly and automatically.
|
|
|
|
|
|
|
|
You might object that a below-outer-join EquivalenceClass doesn't always
|
|
|
|
represent the same values at every level of the join tree, and so using
|
|
|
|
it to uniquely identify a sort order is dubious. This is true, but we
|
|
|
|
can avoid dealing with the fact explicitly because we always consider that
|
|
|
|
an outer join destroys any ordering of its nullable inputs. Thus, even
|
|
|
|
if a path was sorted by {a.x} below an outer join, we'll re-sort if that
|
|
|
|
sort ordering was important; and so using the same PathKey for both sort
|
|
|
|
orderings doesn't create any real problem.
|
|
|
|
|
|
|
|
|
2000-12-14 23:30:45 +01:00
|
|
|
|
2002-08-26 00:39:37 +02:00
|
|
|
Though Bob Devine <bob.devine@worldnet.att.net> was not involved in the
|
|
|
|
coding of our optimizer, he is available to field questions about
|
|
|
|
optimizer topics.
|
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
-- bjm & tgl
|