642 lines
33 KiB
Plaintext
642 lines
33 KiB
Plaintext
$PostgreSQL: pgsql/src/backend/optimizer/README,v 1.52 2009/09/29 01:20:34 tgl Exp $
|
|
|
|
Optimizer
|
|
=========
|
|
|
|
These directories take the Query structure returned by the parser, and
|
|
generate a plan used by the executor. The /plan directory generates the
|
|
actual output plan, the /path code generates all possible ways to join the
|
|
tables, and /prep handles various preprocessing steps for special cases.
|
|
/util is utility stuff. /geqo is the separate "genetic optimization" planner
|
|
--- it does a semi-random search through the join tree space, rather than
|
|
exhaustively considering all possible join trees. (But each join considered
|
|
by /geqo is given to /path to create paths for, so we consider all possible
|
|
implementation paths for each specific join pair even in GEQO mode.)
|
|
|
|
|
|
Paths and Join Pairs
|
|
--------------------
|
|
|
|
During the planning/optimizing process, we build "Path" trees representing
|
|
the different ways of doing a query. We select the cheapest Path that
|
|
generates the desired relation and turn it into a Plan to pass to the
|
|
executor. (There is pretty much a one-to-one correspondence between the
|
|
Path and Plan trees, but Path nodes omit info that won't be needed during
|
|
planning, and include info needed for planning that won't be needed by the
|
|
executor.)
|
|
|
|
The optimizer builds a RelOptInfo structure for each base relation used in
|
|
the query. Base rels are either primitive tables, or subquery subselects
|
|
that are planned via a separate recursive invocation of the planner. A
|
|
RelOptInfo is also built for each join relation that is considered during
|
|
planning. A join rel is simply a combination of base rels. There is only
|
|
one join RelOptInfo for any given set of baserels --- for example, the join
|
|
{A B C} is represented by the same RelOptInfo no matter whether we build it
|
|
by joining A and B first and then adding C, or joining B and C first and
|
|
then adding A, etc. These different means of building the joinrel are
|
|
represented as Paths. For each RelOptInfo we build a list of Paths that
|
|
represent plausible ways to implement the scan or join of that relation.
|
|
Once we've considered all the plausible Paths for a rel, we select the one
|
|
that is cheapest according to the planner's cost estimates. The final plan
|
|
is derived from the cheapest Path for the RelOptInfo that includes all the
|
|
base rels of the query.
|
|
|
|
Possible Paths for a primitive table relation include plain old sequential
|
|
scan, plus index scans for any indexes that exist on the table, plus bitmap
|
|
index scans using one or more indexes. A subquery base relation just has
|
|
one Path, a "SubqueryScan" path (which links to the subplan that was built
|
|
by a recursive invocation of the planner). Likewise a function-RTE base
|
|
relation has only one possible Path.
|
|
|
|
Joins always occur using two RelOptInfos. One is outer, the other inner.
|
|
Outers drive lookups of values in the inner. In a nested loop, lookups of
|
|
values in the inner occur by scanning the inner path once per outer tuple
|
|
to find each matching inner row. In a mergejoin, inner and outer rows are
|
|
ordered, and are accessed in order, so only one scan is required to perform
|
|
the entire join: both inner and outer paths are scanned in-sync. (There's
|
|
not a lot of difference between inner and outer in a mergejoin...) In a
|
|
hashjoin, the inner is scanned first and all its rows are entered in a
|
|
hashtable, then the outer is scanned and for each row we lookup the join
|
|
key in the hashtable.
|
|
|
|
A Path for a join relation is actually a tree structure, with the top
|
|
Path node representing the join method. It has left and right subpaths
|
|
that represent the scan or join methods used for the two input relations.
|
|
|
|
|
|
Join Tree Construction
|
|
----------------------
|
|
|
|
The optimizer generates optimal query plans by doing a more-or-less
|
|
exhaustive search through the ways of executing the query. The best Path
|
|
tree is found by a recursive process:
|
|
|
|
1) Take each base relation in the query, and make a RelOptInfo structure
|
|
for it. Find each potentially useful way of accessing the relation,
|
|
including sequential and index scans, and make Paths representing those
|
|
ways. All the Paths made for a given relation are placed in its
|
|
RelOptInfo.pathlist. (Actually, we discard Paths that are obviously
|
|
inferior alternatives before they ever get into the pathlist --- what
|
|
ends up in the pathlist is the cheapest way of generating each potentially
|
|
useful sort ordering of the relation.) Also create a RelOptInfo.joininfo
|
|
list including all the join clauses that involve this relation. For
|
|
example, the WHERE clause "tab1.col1 = tab2.col1" generates entries in
|
|
both tab1 and tab2's joininfo lists.
|
|
|
|
If we have only a single base relation in the query, we are done.
|
|
Otherwise we have to figure out how to join the base relations into a
|
|
single join relation.
|
|
|
|
2) Normally, any explicit JOIN clauses are "flattened" so that we just
|
|
have a list of relations to join. However, FULL OUTER JOIN clauses are
|
|
never flattened, and other kinds of JOIN might not be either, if the
|
|
flattening process is stopped by join_collapse_limit or from_collapse_limit
|
|
restrictions. Therefore, we end up with a planning problem that contains
|
|
lists of relations to be joined in any order, where any individual item
|
|
might be a sub-list that has to be joined together before we can consider
|
|
joining it to its siblings. We process these sub-problems recursively,
|
|
bottom up. Note that the join list structure constrains the possible join
|
|
orders, but it doesn't constrain the join implementation method at each
|
|
join (nestloop, merge, hash), nor does it say which rel is considered outer
|
|
or inner at each join. We consider all these possibilities in building
|
|
Paths. We generate a Path for each feasible join method, and select the
|
|
cheapest Path.
|
|
|
|
For each planning problem, therefore, we will have a list of relations
|
|
that are either base rels or joinrels constructed per sub-join-lists.
|
|
We can join these rels together in any order the planner sees fit.
|
|
The standard (non-GEQO) planner does this as follows:
|
|
|
|
Consider joining each RelOptInfo to each other RelOptInfo for which there
|
|
is a usable joinclause, and generate a Path for each possible join method
|
|
for each such pair. (If we have a RelOptInfo with no join clauses, we have
|
|
no choice but to generate a clauseless Cartesian-product join; so we
|
|
consider joining that rel to each other available rel. But in the presence
|
|
of join clauses we will only consider joins that use available join
|
|
clauses. Note that join-order restrictions induced by outer joins and
|
|
IN/EXISTS clauses are also checked, to ensure that we find a workable join
|
|
order in cases where those restrictions force a clauseless join to be done.)
|
|
|
|
If we only had two relations in the list, we are done: we just pick
|
|
the cheapest path for the join RelOptInfo. If we had more than two, we now
|
|
need to consider ways of joining join RelOptInfos to each other to make
|
|
join RelOptInfos that represent more than two list items.
|
|
|
|
The join tree is constructed using a "dynamic programming" algorithm:
|
|
in the first pass (already described) we consider ways to create join rels
|
|
representing exactly two list items. The second pass considers ways
|
|
to make join rels that represent exactly three list items; the next pass,
|
|
four items, etc. The last pass considers how to make the final join
|
|
relation that includes all list items --- obviously there can be only one
|
|
join rel at this top level, whereas there can be more than one join rel
|
|
at lower levels. At each level we use joins that follow available join
|
|
clauses, if possible, just as described for the first level.
|
|
|
|
For example:
|
|
|
|
SELECT *
|
|
FROM tab1, tab2, tab3, tab4
|
|
WHERE tab1.col = tab2.col AND
|
|
tab2.col = tab3.col AND
|
|
tab3.col = tab4.col
|
|
|
|
Tables 1, 2, 3, and 4 are joined as:
|
|
{1 2},{2 3},{3 4}
|
|
{1 2 3},{2 3 4}
|
|
{1 2 3 4}
|
|
(other possibilities will be excluded for lack of join clauses)
|
|
|
|
SELECT *
|
|
FROM tab1, tab2, tab3, tab4
|
|
WHERE tab1.col = tab2.col AND
|
|
tab1.col = tab3.col AND
|
|
tab1.col = tab4.col
|
|
|
|
Tables 1, 2, 3, and 4 are joined as:
|
|
{1 2},{1 3},{1 4}
|
|
{1 2 3},{1 3 4},{1 2 4}
|
|
{1 2 3 4}
|
|
|
|
We consider left-handed plans (the outer rel of an upper join is a joinrel,
|
|
but the inner is always a single list item); right-handed plans (outer rel
|
|
is always a single item); and bushy plans (both inner and outer can be
|
|
joins themselves). For example, when building {1 2 3 4} we consider
|
|
joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
|
|
{1 2} to {3 4} (bushy), among other choices. Although the jointree
|
|
scanning code produces these potential join combinations one at a time,
|
|
all the ways to produce the same set of joined base rels will share the
|
|
same RelOptInfo, so the paths produced from different join combinations
|
|
that produce equivalent joinrels will compete in add_path().
|
|
|
|
Once we have built the final join rel, we use either the cheapest path
|
|
for it or the cheapest path with the desired ordering (if that's cheaper
|
|
than applying a sort to the cheapest other path).
|
|
|
|
If the query contains one-sided outer joins (LEFT or RIGHT joins), or
|
|
IN or EXISTS WHERE clauses that were converted to joins, then some of
|
|
the possible join orders may be illegal. These are excluded by having
|
|
join_is_legal consult a side list of such "special" joins to see
|
|
whether a proposed join is illegal. (The same consultation allows it
|
|
to see which join style should be applied for a valid join, ie,
|
|
JOIN_INNER, JOIN_LEFT, etc.)
|
|
|
|
|
|
Valid OUTER JOIN Optimizations
|
|
------------------------------
|
|
|
|
The planner's treatment of outer join reordering is based on the following
|
|
identities:
|
|
|
|
1. (A leftjoin B on (Pab)) innerjoin C on (Pac)
|
|
= (A innerjoin C on (Pac)) leftjoin B on (Pab)
|
|
|
|
where Pac is a predicate referencing A and C, etc (in this case, clearly
|
|
Pac cannot reference B, or the transformation is nonsensical).
|
|
|
|
2. (A leftjoin B on (Pab)) leftjoin C on (Pac)
|
|
= (A leftjoin C on (Pac)) leftjoin B on (Pab)
|
|
|
|
3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
|
|
= A leftjoin (B leftjoin C on (Pbc)) on (Pab)
|
|
|
|
Identity 3 only holds if predicate Pbc must fail for all-null B rows
|
|
(that is, Pbc is strict for at least one column of B). If Pbc is not
|
|
strict, the first form might produce some rows with nonnull C columns
|
|
where the second form would make those entries null.
|
|
|
|
RIGHT JOIN is equivalent to LEFT JOIN after switching the two input
|
|
tables, so the same identities work for right joins.
|
|
|
|
An example of a case that does *not* work is moving an innerjoin into or
|
|
out of the nullable side of an outer join:
|
|
|
|
A leftjoin (B join C on (Pbc)) on (Pab)
|
|
!= (A leftjoin B on (Pab)) join C on (Pbc)
|
|
|
|
SEMI joins work a little bit differently. A semijoin can be reassociated
|
|
into or out of the lefthand side of another semijoin, left join, or
|
|
antijoin, but not into or out of the righthand side. Likewise, an inner
|
|
join, left join, or antijoin can be reassociated into or out of the
|
|
lefthand side of a semijoin, but not into or out of the righthand side.
|
|
|
|
ANTI joins work approximately like LEFT joins, except that identity 3
|
|
fails if the join to C is an antijoin (even if Pbc is strict, and in
|
|
both the cases where the other join is a leftjoin and where it is an
|
|
antijoin). So we can't reorder antijoins into or out of the RHS of a
|
|
leftjoin or antijoin, even if the relevant clause is strict.
|
|
|
|
The current code does not attempt to re-order FULL JOINs at all.
|
|
FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when
|
|
translating the jointree to "joinlist" representation. Other types of
|
|
JOIN nodes are normally collapsed so that they participate fully in the
|
|
join order search. To avoid generating illegal join orders, the planner
|
|
creates a SpecialJoinInfo node for each non-inner join, and join_is_legal
|
|
checks this list to decide if a proposed join is legal.
|
|
|
|
What we store in SpecialJoinInfo nodes are the minimum sets of Relids
|
|
required on each side of the join to form the outer join. Note that
|
|
these are minimums; there's no explicit maximum, since joining other
|
|
rels to the OJ's syntactic rels may be legal. Per identities 1 and 2,
|
|
non-FULL joins can be freely associated into the lefthand side of an
|
|
OJ, but in some cases they can't be associated into the righthand side.
|
|
So the restriction enforced by join_is_legal is that a proposed join
|
|
can't join a rel within or partly within an RHS boundary to one outside
|
|
the boundary, unless the join validly implements some outer join.
|
|
(To support use of identity 3, we have to allow cases where an apparent
|
|
violation of a lower OJ's RHS is committed while forming an upper OJ.
|
|
If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS
|
|
set must be expanded to include the whole of the lower OJ, thereby
|
|
preventing it from being formed before the lower OJ is.)
|
|
|
|
|
|
Pulling Up Subqueries
|
|
---------------------
|
|
|
|
As we described above, a subquery appearing in the range table is planned
|
|
independently and treated as a "black box" during planning of the outer
|
|
query. This is necessary when the subquery uses features such as
|
|
aggregates, GROUP, or DISTINCT. But if the subquery is just a simple
|
|
scan or join, treating the subquery as a black box may produce a poor plan
|
|
compared to considering it as part of the entire plan search space.
|
|
Therefore, at the start of the planning process the planner looks for
|
|
simple subqueries and pulls them up into the main query's jointree.
|
|
|
|
Pulling up a subquery may result in FROM-list joins appearing below the top
|
|
of the join tree. Each FROM-list is planned using the dynamic-programming
|
|
search method described above.
|
|
|
|
If pulling up a subquery produces a FROM-list as a direct child of another
|
|
FROM-list, then we can merge the two FROM-lists together. Once that's
|
|
done, the subquery is an absolutely integral part of the outer query and
|
|
will not constrain the join tree search space at all. However, that could
|
|
result in unpleasant growth of planning time, since the dynamic-programming
|
|
search has runtime exponential in the number of FROM-items considered.
|
|
Therefore, we don't merge FROM-lists if the result would have too many
|
|
FROM-items in one list.
|
|
|
|
|
|
Optimizer Functions
|
|
-------------------
|
|
|
|
The primary entry point is planner().
|
|
|
|
planner()
|
|
set up for recursive handling of subqueries
|
|
do final cleanup after planning
|
|
-subquery_planner()
|
|
pull up sublinks and subqueries from rangetable, if possible
|
|
canonicalize qual
|
|
Attempt to simplify WHERE clause to the most useful form; this includes
|
|
flattening nested AND/ORs and detecting clauses that are duplicated in
|
|
different branches of an OR.
|
|
simplify constant expressions
|
|
process sublinks
|
|
convert Vars of outer query levels into Params
|
|
--grouping_planner()
|
|
preprocess target list for non-SELECT queries
|
|
handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
|
|
ORDER BY, DISTINCT, LIMIT
|
|
--query_planner()
|
|
pull out constant quals, which can be used to gate execution of the
|
|
whole plan (if any are found, we make a top-level Result node
|
|
to do the gating)
|
|
make list of base relations used in query
|
|
split up the qual into restrictions (a=1) and joins (b=c)
|
|
find qual clauses that enable merge and hash joins
|
|
----make_one_rel()
|
|
set_base_rel_pathlist()
|
|
find seqscan and all index paths for each base relation
|
|
find selectivity of columns used in joins
|
|
make_rel_from_joinlist()
|
|
hand off join subproblems to a plugin, GEQO, or standard_join_search()
|
|
-----standard_join_search()
|
|
call join_search_one_level() for each level of join tree needed
|
|
join_search_one_level():
|
|
For each joinrel of the prior level, do make_rels_by_clause_joins()
|
|
if it has join clauses, or make_rels_by_clauseless_joins() if not.
|
|
Also generate "bushy plan" joins between joinrels of lower levels.
|
|
Back at standard_join_search(), apply set_cheapest() to extract the
|
|
cheapest path for each newly constructed joinrel.
|
|
Loop back if this wasn't the top join level.
|
|
Back at query_planner:
|
|
put back any constant quals by adding a Result node
|
|
Back at grouping_planner:
|
|
do grouping(GROUP)
|
|
do aggregates
|
|
make unique(DISTINCT)
|
|
make sort(ORDER BY)
|
|
make limit(LIMIT/OFFSET)
|
|
|
|
|
|
Optimizer Data Structures
|
|
-------------------------
|
|
|
|
PlannerGlobal - global information for a single planner invocation
|
|
|
|
PlannerInfo - information for planning a particular Query (we make
|
|
a separate PlannerInfo node for each sub-Query)
|
|
|
|
RelOptInfo - a relation or joined relations
|
|
|
|
RestrictInfo - WHERE clauses, like "x = 3" or "y = z"
|
|
(note the same structure is used for restriction and
|
|
join clauses)
|
|
|
|
Path - every way to generate a RelOptInfo(sequential,index,joins)
|
|
SeqScan - a plain Path node with pathtype = T_SeqScan
|
|
IndexPath - index scans
|
|
BitmapHeapPath - top of a bitmapped index scan
|
|
TidPath - scan by CTID
|
|
AppendPath - append multiple subpaths together
|
|
ResultPath - a Result plan node (used for FROM-less SELECT)
|
|
MaterialPath - a Material plan node
|
|
UniquePath - remove duplicate rows
|
|
NestPath - nested-loop joins
|
|
MergePath - merge joins
|
|
HashPath - hash joins
|
|
NoOpPath - same as its input path (used when a join is removed)
|
|
|
|
EquivalenceClass - a data structure representing a set of values known equal
|
|
|
|
PathKey - a data structure representing the sort ordering of a path
|
|
|
|
The optimizer spends a good deal of its time worrying about the ordering
|
|
of the tuples returned by a path. The reason this is useful is that by
|
|
knowing the sort ordering of a path, we may be able to use that path as
|
|
the left or right input of a mergejoin and avoid an explicit sort step.
|
|
Nestloops and hash joins don't really care what the order of their inputs
|
|
is, but mergejoin needs suitably ordered inputs. Therefore, all paths
|
|
generated during the optimization process are marked with their sort order
|
|
(to the extent that it is known) for possible use by a higher-level merge.
|
|
|
|
It is also possible to avoid an explicit sort step to implement a user's
|
|
ORDER BY clause if the final path has the right ordering already, so the
|
|
sort ordering is of interest even at the top level. query_planner() will
|
|
look for the cheapest path with a sort order matching the desired order,
|
|
and grouping_planner() will compare its cost to the cost of using the
|
|
cheapest-overall path and doing an explicit sort.
|
|
|
|
When we are generating paths for a particular RelOptInfo, we discard a path
|
|
if it is more expensive than another known path that has the same or better
|
|
sort order. We will never discard a path that is the only known way to
|
|
achieve a given sort order (without an explicit sort, that is). In this
|
|
way, the next level up will have the maximum freedom to build mergejoins
|
|
without sorting, since it can pick from any of the paths retained for its
|
|
inputs.
|
|
|
|
|
|
EquivalenceClasses
|
|
------------------
|
|
|
|
During the deconstruct_jointree() scan of the query's qual clauses, we look
|
|
for mergejoinable equality clauses A = B whose applicability is not delayed
|
|
by an outer join; these are called "equivalence clauses". When we find
|
|
one, we create an EquivalenceClass containing the expressions A and B to
|
|
record this knowledge. If we later find another equivalence clause B = C,
|
|
we add C to the existing EquivalenceClass for {A B}; this may require
|
|
merging two existing EquivalenceClasses. At the end of the scan, we have
|
|
sets of values that are known all transitively equal to each other. We can
|
|
therefore use a comparison of any pair of the values as a restriction or
|
|
join clause (when these values are available at the scan or join, of
|
|
course); furthermore, we need test only one such comparison, not all of
|
|
them. Therefore, equivalence clauses are removed from the standard qual
|
|
distribution process. Instead, when preparing a restriction or join clause
|
|
list, we examine each EquivalenceClass to see if it can contribute a
|
|
clause, and if so we select an appropriate pair of values to compare. For
|
|
example, if we are trying to join A's relation to C's, we can generate the
|
|
clause A = C, even though this appeared nowhere explicitly in the original
|
|
query. This may allow us to explore join paths that otherwise would have
|
|
been rejected as requiring Cartesian-product joins.
|
|
|
|
Sometimes an EquivalenceClass may contain a pseudo-constant expression
|
|
(i.e., one not containing Vars or Aggs of the current query level, nor
|
|
volatile functions). In this case we do not follow the policy of
|
|
dynamically generating join clauses: instead, we dynamically generate
|
|
restriction clauses "var = const" wherever one of the variable members of
|
|
the class can first be computed. For example, if we have A = B and B = 42,
|
|
we effectively generate the restriction clauses A = 42 and B = 42, and then
|
|
we need not bother with explicitly testing the join clause A = B when the
|
|
relations are joined. In effect, all the class members can be tested at
|
|
relation-scan level and there's never a need for join tests.
|
|
|
|
The precise technical interpretation of an EquivalenceClass is that it
|
|
asserts that at any plan node where more than one of its member values
|
|
can be computed, output rows in which the values are not all equal may
|
|
be discarded without affecting the query result. (We require all levels
|
|
of the plan to enforce EquivalenceClasses, hence a join need not recheck
|
|
equality of values that were computable by one of its children.) For an
|
|
ordinary EquivalenceClass that is "valid everywhere", we can further infer
|
|
that the values are all non-null, because all mergejoinable operators are
|
|
strict. However, we also allow equivalence clauses that appear below the
|
|
nullable side of an outer join to form EquivalenceClasses; for these
|
|
classes, the interpretation is that either all the values are equal, or
|
|
all (except pseudo-constants) have gone to null. (This requires a
|
|
limitation that non-constant members be strict, else they might not go
|
|
to null when the other members do.) Consider for example
|
|
|
|
SELECT *
|
|
FROM a LEFT JOIN
|
|
(SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
|
|
ON a.x = ss.y
|
|
WHERE a.x = 42;
|
|
|
|
We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
|
|
apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed
|
|
clauses from forming EquivalenceClasses is exactly that we want to be able
|
|
to push any derived clauses as far down as possible.) But once above the
|
|
outer join it's no longer necessarily the case that b.y = 10, and thus we
|
|
cannot use such EquivalenceClasses to conclude that sorting is unnecessary
|
|
(see discussion of PathKeys below).
|
|
|
|
In this example, notice also that a.x = ss.y (really a.x = b.y) is not an
|
|
equivalence clause because its applicability to b is delayed by the outer
|
|
join; thus we do not try to insert b.y into the equivalence class {a.x 42}.
|
|
But since we see that a.x has been equated to 42 above the outer join, we
|
|
are able to form a below-outer-join class {b.y 42}; this restriction can be
|
|
added because no b/c row not having b.y = 42 can contribute to the result
|
|
of the outer join, and so we need not compute such rows. Now this class
|
|
will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42,
|
|
which lets the planner deduce that the b/c join need not be computed at all
|
|
because none of its rows can contribute to the outer join. (This gets
|
|
implemented as a gating Result filter, since more usually the potential
|
|
contradiction involves Param values rather than just Consts, and thus has
|
|
to be checked at runtime.)
|
|
|
|
To aid in determining the sort ordering(s) that can work with a mergejoin,
|
|
we mark each mergejoinable clause with the EquivalenceClasses of its left
|
|
and right inputs. For an equivalence clause, these are of course the same
|
|
EquivalenceClass. For a non-equivalence mergejoinable clause (such as an
|
|
outer-join qualification), we generate two separate EquivalenceClasses for
|
|
the left and right inputs. This may result in creating single-item
|
|
equivalence "classes", though of course these are still subject to merging
|
|
if other equivalence clauses are later found to bear on the same
|
|
expressions.
|
|
|
|
Another way that we may form a single-item EquivalenceClass is in creation
|
|
of a PathKey to represent a desired sort order (see below). This is a bit
|
|
different from the above cases because such an EquivalenceClass might
|
|
contain an aggregate function or volatile expression. (A clause containing
|
|
a volatile function will never be considered mergejoinable, even if its top
|
|
operator is mergejoinable, so there is no way for a volatile expression to
|
|
get into EquivalenceClasses otherwise. Aggregates are disallowed in WHERE
|
|
altogether, so will never be found in a mergejoinable clause.) This is just
|
|
a convenience to maintain a uniform PathKey representation: such an
|
|
EquivalenceClass will never be merged with any other. Note in particular
|
|
that a single-item EquivalenceClass {a.x} is *not* meant to imply an
|
|
assertion that a.x = a.x; the practical effect of this is that a.x could
|
|
be NULL.
|
|
|
|
An EquivalenceClass also contains a list of btree opfamily OIDs, which
|
|
determines what the equalities it represents actually "mean". All the
|
|
equivalence clauses that contribute to an EquivalenceClass must have
|
|
equality operators that belong to the same set of opfamilies. (Note: most
|
|
of the time, a particular equality operator belongs to only one family, but
|
|
it's possible that it belongs to more than one. We keep track of all the
|
|
families to ensure that we can make use of an index belonging to any one of
|
|
the families for mergejoin purposes.)
|
|
|
|
|
|
PathKeys
|
|
--------
|
|
|
|
The PathKeys data structure represents what is known about the sort order
|
|
of the tuples generated by a particular Path. A path's pathkeys field is a
|
|
list of PathKey nodes, where the n'th item represents the n'th sort key of
|
|
the result. Each PathKey contains these fields:
|
|
|
|
* a reference to an EquivalenceClass
|
|
* a btree opfamily OID (must match one of those in the EC)
|
|
* a sort direction (ascending or descending)
|
|
* a nulls-first-or-last flag
|
|
|
|
The EquivalenceClass represents the value being sorted on. Since the
|
|
various members of an EquivalenceClass are known equal according to the
|
|
opfamily, we can consider a path sorted by any one of them to be sorted by
|
|
any other too; this is what justifies referencing the whole
|
|
EquivalenceClass rather than just one member of it.
|
|
|
|
In single/base relation RelOptInfo's, the Paths represent various ways
|
|
of scanning the relation and the resulting ordering of the tuples.
|
|
Sequential scan Paths have NIL pathkeys, indicating no known ordering.
|
|
Index scans have Path.pathkeys that represent the chosen index's ordering,
|
|
if any. A single-key index would create a single-PathKey list, while a
|
|
multi-column index generates a list with one element per index column.
|
|
(Actually, since an index can be scanned either forward or backward, there
|
|
are two possible sort orders and two possible PathKey lists it can
|
|
generate.)
|
|
|
|
Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL
|
|
pathkeys since we can say nothing about the overall order of its result.
|
|
Also, an indexscan on an unordered type of index generates NIL pathkeys.
|
|
However, we can always create a pathkey by doing an explicit sort. The
|
|
pathkeys for a Sort plan's output just represent the sort key fields and
|
|
the ordering operators used.
|
|
|
|
Things get more interesting when we consider joins. Suppose we do a
|
|
mergejoin between A and B using the mergeclause A.X = B.Y. The output
|
|
of the mergejoin is sorted by X --- but it is also sorted by Y. Again,
|
|
this can be represented by a PathKey referencing an EquivalenceClass
|
|
containing both X and Y.
|
|
|
|
With a little further thought, it becomes apparent that nestloop joins
|
|
can also produce sorted output. For example, if we do a nestloop join
|
|
between outer relation A and inner relation B, then any pathkeys relevant
|
|
to A are still valid for the join result: we have not altered the order of
|
|
the tuples from A. Even more interesting, if there was an equivalence clause
|
|
A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
|
|
that B.Y is a pathkey for the join result; X was ordered before and still
|
|
is, and the joined values of Y are equal to the joined values of X, so Y
|
|
must now be ordered too. This is true even though we used neither an
|
|
explicit sort nor a mergejoin on Y. (Note: hash joins cannot be counted
|
|
on to preserve the order of their outer relation, because the executor
|
|
might decide to "batch" the join, so we always set pathkeys to NIL for
|
|
a hashjoin path.) Exception: a RIGHT or FULL join doesn't preserve the
|
|
ordering of its outer relation, because it might insert nulls at random
|
|
points in the ordering.
|
|
|
|
In general, we can justify using EquivalenceClasses as the basis for
|
|
pathkeys because, whenever we scan a relation containing multiple
|
|
EquivalenceClass members or join two relations each containing
|
|
EquivalenceClass members, we apply restriction or join clauses derived from
|
|
the EquivalenceClass. This guarantees that any two values listed in the
|
|
EquivalenceClass are in fact equal in all tuples emitted by the scan or
|
|
join, and therefore that if the tuples are sorted by one of the values,
|
|
they can be considered sorted by any other as well. It does not matter
|
|
whether the test clause is used as a mergeclause, or merely enforced
|
|
after-the-fact as a qpqual filter.
|
|
|
|
Note that there is no particular difficulty in labeling a path's sort
|
|
order with a PathKey referencing an EquivalenceClass that contains
|
|
variables not yet joined into the path's output. We can simply ignore
|
|
such entries as not being relevant (yet). This makes it possible to
|
|
use the same EquivalenceClasses throughout the join planning process.
|
|
In fact, by being careful not to generate multiple identical PathKey
|
|
objects, we can reduce comparison of EquivalenceClasses and PathKeys
|
|
to simple pointer comparison, which is a huge savings because add_path
|
|
has to make a large number of PathKey comparisons in deciding whether
|
|
competing Paths are equivalently sorted.
|
|
|
|
Pathkeys are also useful to represent an ordering that we wish to achieve,
|
|
since they are easily compared to the pathkeys of a potential candidate
|
|
path. So, SortGroupClause lists are turned into pathkeys lists for use
|
|
inside the optimizer.
|
|
|
|
Because we have to generate pathkeys lists from the sort clauses before
|
|
we've finished EquivalenceClass merging, we cannot use the pointer-equality
|
|
method of comparing PathKeys in the earliest stages of the planning
|
|
process. Instead, we generate "non canonical" PathKeys that reference
|
|
single-element EquivalenceClasses that might get merged later. After we
|
|
complete EquivalenceClass merging, we replace these with "canonical"
|
|
PathKeys that reference only fully-merged classes, and after that we make
|
|
sure we don't generate more than one copy of each "canonical" PathKey.
|
|
Then it is safe to use pointer comparison on canonical PathKeys.
|
|
|
|
An additional refinement we can make is to insist that canonical pathkey
|
|
lists (sort orderings) do not mention the same EquivalenceClass more than
|
|
once. For example, in all these cases the second sort column is redundant,
|
|
because it cannot distinguish values that are the same according to the
|
|
first sort column:
|
|
SELECT ... ORDER BY x, x
|
|
SELECT ... ORDER BY x, x DESC
|
|
SELECT ... WHERE x = y ORDER BY x, y
|
|
Although a user probably wouldn't write "ORDER BY x,x" directly, such
|
|
redundancies are more probable once equivalence classes have been
|
|
considered. Also, the system may generate redundant pathkey lists when
|
|
computing the sort ordering needed for a mergejoin. By eliminating the
|
|
redundancy, we save time and improve planning, since the planner will more
|
|
easily recognize equivalent orderings as being equivalent.
|
|
|
|
Another interesting property is that if the underlying EquivalenceClass
|
|
contains a constant and is not below an outer join, then the pathkey is
|
|
completely redundant and need not be sorted by at all! Every row must
|
|
contain the same constant value, so there's no need to sort. (If the EC is
|
|
below an outer join, we still have to sort, since some of the rows might
|
|
have gone to null and others not. In this case we must be careful to pick
|
|
a non-const member to sort by. The assumption that all the non-const
|
|
members go to null at the same plan level is critical here, else they might
|
|
not produce the same sort order.) This might seem pointless because users
|
|
are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
|
|
recognize when particular index columns are irrelevant to the sort order:
|
|
if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
|
|
produces correctly ordered data without a sort step. We used to have very
|
|
ugly ad-hoc code to recognize that in limited contexts, but discarding
|
|
constant ECs from pathkeys makes it happen cleanly and automatically.
|
|
|
|
You might object that a below-outer-join EquivalenceClass doesn't always
|
|
represent the same values at every level of the join tree, and so using
|
|
it to uniquely identify a sort order is dubious. This is true, but we
|
|
can avoid dealing with the fact explicitly because we always consider that
|
|
an outer join destroys any ordering of its nullable inputs. Thus, even
|
|
if a path was sorted by {a.x} below an outer join, we'll re-sort if that
|
|
sort ordering was important; and so using the same PathKey for both sort
|
|
orderings doesn't create any real problem.
|
|
|
|
|
|
|
|
Though Bob Devine <bob.devine@worldnet.att.net> was not involved in the
|
|
coding of our optimizer, he is available to field questions about
|
|
optimizer topics.
|
|
|
|
-- bjm & tgl
|
|
|