1999-02-09 04:51:42 +01:00
|
|
|
Summary
|
|
|
|
-------
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
These directories take the Query structure returned by the parser, and
|
|
|
|
generate a plan used by the executor. The /plan directory generates the
|
|
|
|
actual output plan, the /path code generates all possible ways to join the
|
2000-11-12 01:37:02 +01:00
|
|
|
tables, and /prep handles various preprocessing steps for special cases.
|
|
|
|
/util is utility stuff. /geqo is the separate "genetic optimization" planner
|
|
|
|
--- it does a semi-random search through the join tree space, rather than
|
|
|
|
exhaustively considering all possible join trees. (But each join considered
|
|
|
|
by /geqo is given to /path to create paths for, so we consider all possible
|
2000-09-29 20:21:41 +02:00
|
|
|
implementation paths for each specific join pair even in GEQO mode.)
|
|
|
|
|
|
|
|
|
|
|
|
Paths and Join Pairs
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
During the planning/optimizing process, we build "Path" trees representing
|
|
|
|
the different ways of doing a query. We select the cheapest Path that
|
|
|
|
generates the desired relation and turn it into a Plan to pass to the
|
|
|
|
executor. (There is pretty much a one-to-one correspondence between the
|
|
|
|
Path and Plan trees, but Path nodes omit info that won't be needed during
|
|
|
|
planning, and include info needed for planning that won't be needed by the
|
|
|
|
executor.)
|
|
|
|
|
|
|
|
The optimizer builds a RelOptInfo structure for each base relation used in
|
|
|
|
the query. Base rels are either primitive tables, or subquery subselects
|
|
|
|
that are planned via a separate recursive invocation of the planner. A
|
|
|
|
RelOptInfo is also built for each join relation that is considered during
|
|
|
|
planning. A join rel is simply a combination of base rels. There is only
|
|
|
|
one join RelOptInfo for any given set of baserels --- for example, the join
|
|
|
|
{A B C} is represented by the same RelOptInfo no matter whether we build it
|
|
|
|
by joining A and B first and then adding C, or joining B and C first and
|
|
|
|
then adding A, etc. These different means of building the joinrel are
|
|
|
|
represented as Paths. For each RelOptInfo we build a list of Paths that
|
|
|
|
represent plausible ways to implement the scan or join of that relation.
|
|
|
|
Once we've considered all the plausible Paths for a rel, we select the one
|
|
|
|
that is cheapest according to the planner's cost estimates. The final plan
|
|
|
|
is derived from the cheapest Path for the RelOptInfo that includes all the
|
|
|
|
base rels of the query.
|
|
|
|
|
|
|
|
Possible Paths for a primitive table relation include plain old sequential
|
|
|
|
scan, plus index scans for any indexes that exist on the table. A subquery
|
|
|
|
base relation just has one Path, a "SubqueryScan" path (which links to the
|
2002-05-13 01:43:04 +02:00
|
|
|
subplan that was built by a recursive invocation of the planner). Likewise
|
|
|
|
a function-RTE base relation has only one possible Path.
|
2000-09-29 20:21:41 +02:00
|
|
|
|
|
|
|
Joins always occur using two RelOptInfos. One is outer, the other inner.
|
|
|
|
Outers drive lookups of values in the inner. In a nested loop, lookups of
|
|
|
|
values in the inner occur by scanning the inner path once per outer tuple
|
|
|
|
to find each matching inner row. In a mergejoin, inner and outer rows are
|
|
|
|
ordered, and are accessed in order, so only one scan is required to perform
|
|
|
|
the entire join: both inner and outer paths are scanned in-sync. (There's
|
|
|
|
not a lot of difference between inner and outer in a mergejoin...) In a
|
|
|
|
hashjoin, the inner is scanned first and all its rows are entered in a
|
|
|
|
hashtable, then the outer is scanned and for each row we lookup the join
|
|
|
|
key in the hashtable.
|
|
|
|
|
|
|
|
A Path for a join relation is actually a tree structure, with the top
|
|
|
|
Path node representing the join method. It has left and right subpaths
|
|
|
|
that represent the scan or join methods used for the two input relations.
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
|
|
|
|
Join Tree Construction
|
|
|
|
----------------------
|
|
|
|
|
1999-08-16 04:17:58 +02:00
|
|
|
The optimizer generates optimal query plans by doing a more-or-less
|
2000-09-29 20:21:41 +02:00
|
|
|
exhaustive search through the ways of executing the query. The best Path
|
|
|
|
tree is found by a recursive process:
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
1) Take each base relation in the query, and make a RelOptInfo structure
|
|
|
|
for it. Find each potentially useful way of accessing the relation,
|
|
|
|
including sequential and index scans, and make a Path representing that
|
|
|
|
way. All the Paths made for a given relation are placed in its
|
|
|
|
RelOptInfo.pathlist. (Actually, we discard Paths that are obviously
|
|
|
|
inferior alternatives before they ever get into the pathlist --- what
|
|
|
|
ends up in the pathlist is the cheapest way of generating each potentially
|
|
|
|
useful sort ordering of the relation.) Also create RelOptInfo.joininfo
|
2000-09-12 23:07:18 +02:00
|
|
|
nodes that list all the join clauses that involve this relation. For
|
|
|
|
example, the WHERE clause "tab1.col1 = tab2.col1" generates a JoinInfo
|
|
|
|
for tab1 listing tab2 as an unjoined relation, and also one for tab2
|
|
|
|
showing tab1 as an unjoined relation.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
2001-01-17 07:41:31 +01:00
|
|
|
If we have only a single base relation in the query, we are done.
|
2000-02-07 05:41:04 +01:00
|
|
|
Otherwise we have to figure out how to join the base relations into a
|
|
|
|
single join relation.
|
|
|
|
|
2000-09-29 20:21:41 +02:00
|
|
|
2) If the query's FROM clause contains explicit JOIN clauses, we join
|
|
|
|
those pairs of relations in exactly the tree structure indicated by the
|
|
|
|
JOIN clauses. (This is absolutely necessary when dealing with outer JOINs.
|
|
|
|
For inner JOINs we have more flexibility in theory, but don't currently
|
|
|
|
exploit it in practice.) For each such join pair, we generate a Path
|
|
|
|
for each feasible join method, and select the cheapest Path. Note that
|
|
|
|
the JOIN clause structure determines the join Path structure, but it
|
|
|
|
doesn't constrain the join implementation method at each join (nestloop,
|
|
|
|
merge, hash), nor does it say which rel is considered outer or inner at
|
|
|
|
each join. We consider all these possibilities in building Paths.
|
|
|
|
|
|
|
|
3) At the top level of the FROM clause we will have a list of relations
|
|
|
|
that are either base rels or joinrels constructed per JOIN directives.
|
|
|
|
We can join these rels together in any order the planner sees fit.
|
|
|
|
The standard (non-GEQO) planner does this as follows:
|
|
|
|
|
|
|
|
Consider joining each RelOptInfo to each other RelOptInfo specified in its
|
|
|
|
RelOptInfo.joininfo, and generate a Path for each possible join method for
|
|
|
|
each such pair. (If we have a RelOptInfo with no join clauses, we have no
|
|
|
|
choice but to generate a clauseless Cartesian-product join; so we consider
|
|
|
|
joining that rel to each other available rel. But in the presence of join
|
|
|
|
clauses we will only consider joins that use available join clauses.)
|
|
|
|
|
|
|
|
If we only had two relations in the FROM list, we are done: we just pick
|
|
|
|
the cheapest path for the join RelOptInfo. If we had more than two, we now
|
1999-08-16 04:17:58 +02:00
|
|
|
need to consider ways of joining join RelOptInfos to each other to make
|
2000-09-29 20:21:41 +02:00
|
|
|
join RelOptInfos that represent more than two FROM items.
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
The join tree is constructed using a "dynamic programming" algorithm:
|
|
|
|
in the first pass (already described) we consider ways to create join rels
|
2000-09-29 20:21:41 +02:00
|
|
|
representing exactly two FROM items. The second pass considers ways
|
|
|
|
to make join rels that represent exactly three FROM items; the next pass,
|
|
|
|
four items, etc. The last pass considers how to make the final join
|
|
|
|
relation that includes all FROM items --- obviously there can be only one
|
2000-02-07 05:41:04 +01:00
|
|
|
join rel at this top level, whereas there can be more than one join rel
|
|
|
|
at lower levels. At each level we use joins that follow available join
|
|
|
|
clauses, if possible, just as described for the first level.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
For example:
|
1999-02-09 04:51:42 +01:00
|
|
|
|
1999-02-15 23:19:01 +01:00
|
|
|
SELECT *
|
|
|
|
FROM tab1, tab2, tab3, tab4
|
|
|
|
WHERE tab1.col = tab2.col AND
|
|
|
|
tab2.col = tab3.col AND
|
|
|
|
tab3.col = tab4.col
|
|
|
|
|
|
|
|
Tables 1, 2, 3, and 4 are joined as:
|
|
|
|
{1 2},{2 3},{3 4}
|
|
|
|
{1 2 3},{2 3 4}
|
|
|
|
{1 2 3 4}
|
2000-02-07 05:41:04 +01:00
|
|
|
(other possibilities will be excluded for lack of join clauses)
|
1999-02-15 23:19:01 +01:00
|
|
|
|
|
|
|
SELECT *
|
|
|
|
FROM tab1, tab2, tab3, tab4
|
|
|
|
WHERE tab1.col = tab2.col AND
|
|
|
|
tab1.col = tab3.col AND
|
|
|
|
tab1.col = tab4.col
|
|
|
|
|
|
|
|
Tables 1, 2, 3, and 4 are joined as:
|
|
|
|
{1 2},{1 3},{1 4}
|
2000-02-07 05:41:04 +01:00
|
|
|
{1 2 3},{1 3 4},{1 2 4}
|
1999-02-15 23:19:01 +01:00
|
|
|
{1 2 3 4}
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
We consider left-handed plans (the outer rel of an upper join is a joinrel,
|
2000-09-29 20:21:41 +02:00
|
|
|
but the inner is always a single FROM item); right-handed plans (outer rel
|
|
|
|
is always a single item); and bushy plans (both inner and outer can be
|
|
|
|
joins themselves). For example, when building {1 2 3 4} we consider
|
|
|
|
joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
|
|
|
|
{1 2} to {3 4} (bushy), among other choices. Although the jointree
|
|
|
|
scanning code produces these potential join combinations one at a time,
|
|
|
|
all the ways to produce the same set of joined base rels will share the
|
|
|
|
same RelOptInfo, so the paths produced from different join combinations
|
|
|
|
that produce equivalent joinrels will compete in add_path.
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
Once we have built the final join rel, we use either the cheapest path
|
|
|
|
for it or the cheapest path with the desired ordering (if that's cheaper
|
|
|
|
than applying a sort to the cheapest other path).
|
|
|
|
|
2000-09-29 20:21:41 +02:00
|
|
|
|
|
|
|
Pulling up subqueries
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
As we described above, a subquery appearing in the range table is planned
|
|
|
|
independently and treated as a "black box" during planning of the outer
|
|
|
|
query. This is necessary when the subquery uses features such as
|
|
|
|
aggregates, GROUP, or DISTINCT. But if the subquery is just a simple
|
|
|
|
scan or join, treating the subquery as a black box may produce a poor plan
|
|
|
|
compared to considering it as part of the entire plan search space.
|
|
|
|
Therefore, at the start of the planning process the planner looks for
|
|
|
|
simple subqueries and pulls them up into the main query's jointree.
|
|
|
|
|
|
|
|
Pulling up a subquery may result in FROM-list joins appearing below the top
|
|
|
|
of the join tree. Each FROM-list is planned using the dynamic-programming
|
|
|
|
search method described above.
|
|
|
|
|
|
|
|
If pulling up a subquery produces a FROM-list as a direct child of another
|
|
|
|
FROM-list (with no explicit JOIN directives between), then we can merge the
|
|
|
|
two FROM-lists together. Once that's done, the subquery is an absolutely
|
|
|
|
integral part of the outer query and will not constrain the join tree
|
|
|
|
search space at all. However, that could result in unpleasant growth of
|
|
|
|
planning time, since the dynamic-programming search has runtime exponential
|
|
|
|
in the number of FROM-items considered. Therefore, we don't merge
|
|
|
|
FROM-lists if the result would have too many FROM-items in one list.
|
2000-09-12 23:07:18 +02:00
|
|
|
|
1999-02-08 05:29:25 +01:00
|
|
|
|
1999-02-04 04:19:11 +01:00
|
|
|
Optimizer Functions
|
|
|
|
-------------------
|
|
|
|
|
2000-03-21 06:12:12 +01:00
|
|
|
The primary entry point is planner().
|
|
|
|
|
1997-12-17 19:02:33 +01:00
|
|
|
planner()
|
2000-03-21 06:12:12 +01:00
|
|
|
set up for recursive handling of subqueries
|
|
|
|
do final cleanup after planning.
|
|
|
|
-subquery_planner()
|
2000-09-29 20:21:41 +02:00
|
|
|
pull up subqueries from rangetable, if possible
|
2000-03-21 06:12:12 +01:00
|
|
|
simplify constant expressions
|
|
|
|
canonicalize qual
|
2000-02-07 05:41:04 +01:00
|
|
|
Attempt to reduce WHERE clause to either CNF or DNF canonical form.
|
|
|
|
CNF (top-level-AND) is preferred, since the optimizer can then use
|
|
|
|
any of the AND subclauses to filter tuples; but quals that are in
|
|
|
|
or close to DNF form will suffer exponential expansion if we try to
|
|
|
|
force them to CNF. In pathological cases either transform may expand
|
|
|
|
the qual unreasonably; so we may have to leave it un-normalized,
|
|
|
|
thereby reducing the accuracy of selectivity estimates.
|
2000-03-21 06:12:12 +01:00
|
|
|
process sublinks
|
|
|
|
convert Vars of outer query levels into Params
|
2000-11-12 01:37:02 +01:00
|
|
|
--grouping_planner()
|
|
|
|
preprocess target list for non-SELECT queries
|
|
|
|
handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
|
|
|
|
ORDER BY, DISTINCT, LIMIT
|
2000-03-21 06:12:12 +01:00
|
|
|
--query_planner()
|
2000-09-29 20:21:41 +02:00
|
|
|
pull out constant quals, which can be used to gate execution of the
|
|
|
|
whole plan (if any are found, we make a top-level Result node
|
|
|
|
to do the gating)
|
2002-11-06 01:00:45 +01:00
|
|
|
make list of base relations used in query
|
|
|
|
split up the qual into restrictions (a=1) and joins (b=c)
|
|
|
|
find qual clauses that enable merge and hash joins
|
1999-02-15 23:19:01 +01:00
|
|
|
----make_one_rel()
|
|
|
|
set_base_rel_pathlist()
|
2000-09-12 23:07:18 +02:00
|
|
|
find scan and all index paths for each base relation
|
1999-02-15 23:19:01 +01:00
|
|
|
find selectivity of columns used in joins
|
|
|
|
-----make_one_rel_by_joins()
|
1997-12-17 19:02:33 +01:00
|
|
|
jump to geqo if needed
|
2000-02-07 05:41:04 +01:00
|
|
|
else call make_rels_by_joins() for each level of join tree needed
|
|
|
|
make_rels_by_joins():
|
|
|
|
For each joinrel of the prior level, do make_rels_by_clause_joins()
|
|
|
|
if it has join clauses, or make_rels_by_clauseless_joins() if not.
|
|
|
|
Also generate "bushy plan" joins between joinrels of lower levels.
|
|
|
|
Back at make_one_rel_by_joins(), apply set_cheapest() to extract the
|
|
|
|
cheapest path for each newly constructed joinrel.
|
|
|
|
Loop back if this wasn't the top join level.
|
2000-09-29 20:21:41 +02:00
|
|
|
Back at query_planner:
|
2002-11-06 01:00:45 +01:00
|
|
|
put back any constant quals by adding a Result node
|
2000-11-12 01:37:02 +01:00
|
|
|
Back at grouping_planner:
|
2000-09-29 20:21:41 +02:00
|
|
|
do grouping(GROUP)
|
|
|
|
do aggregates
|
1997-12-17 19:02:33 +01:00
|
|
|
make unique(DISTINCT)
|
|
|
|
make sort(ORDER BY)
|
2000-11-12 01:37:02 +01:00
|
|
|
make limit(LIMIT/OFFSET)
|
1999-02-03 21:15:53 +01:00
|
|
|
|
|
|
|
|
1999-08-16 04:17:58 +02:00
|
|
|
Optimizer Data Structures
|
|
|
|
-------------------------
|
1999-02-04 04:19:11 +01:00
|
|
|
|
1999-02-15 23:19:01 +01:00
|
|
|
RelOptInfo - a relation or joined relations
|
1999-02-04 04:19:11 +01:00
|
|
|
|
2003-01-15 20:35:48 +01:00
|
|
|
RestrictInfo - WHERE clauses, like "x = 3" or "y = z"
|
|
|
|
(note the same structure is used for restriction and
|
|
|
|
join clauses)
|
|
|
|
JoinInfo - join clauses associated with a particular pair of relations
|
1999-02-04 04:19:11 +01:00
|
|
|
|
1999-02-15 23:19:01 +01:00
|
|
|
Path - every way to generate a RelOptInfo(sequential,index,joins)
|
2002-11-06 01:00:45 +01:00
|
|
|
SeqScan - a plain Path node with pathtype = T_SeqScan
|
1999-02-15 23:19:01 +01:00
|
|
|
IndexPath - index scans
|
2002-11-06 01:00:45 +01:00
|
|
|
TidPath - scan by CTID
|
|
|
|
AppendPath - append multiple subpaths together
|
2002-11-30 06:21:03 +01:00
|
|
|
ResultPath - a Result plan node (used for variable-free tlist or qual)
|
|
|
|
MaterialPath - a Material plan node
|
1999-08-16 04:17:58 +02:00
|
|
|
NestPath - nested-loop joins
|
1999-02-15 23:19:01 +01:00
|
|
|
MergePath - merge joins
|
|
|
|
HashPath - hash joins
|
1999-02-04 02:47:02 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
PathKeys - a data structure representing the ordering of a path
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
The optimizer spends a good deal of its time worrying about the ordering
|
|
|
|
of the tuples returned by a path. The reason this is useful is that by
|
|
|
|
knowing the sort ordering of a path, we may be able to use that path as
|
|
|
|
the left or right input of a mergejoin and avoid an explicit sort step.
|
|
|
|
Nestloops and hash joins don't really care what the order of their inputs
|
|
|
|
is, but mergejoin needs suitably ordered inputs. Therefore, all paths
|
|
|
|
generated during the optimization process are marked with their sort order
|
|
|
|
(to the extent that it is known) for possible use by a higher-level merge.
|
|
|
|
|
|
|
|
It is also possible to avoid an explicit sort step to implement a user's
|
2000-02-07 05:41:04 +01:00
|
|
|
ORDER BY clause if the final path has the right ordering already, so the
|
2002-11-06 01:00:45 +01:00
|
|
|
sort ordering is of interest even at the top level. query_planner() will
|
2000-02-07 05:41:04 +01:00
|
|
|
look for the cheapest path with a sort order matching the desired order,
|
2002-11-06 01:00:45 +01:00
|
|
|
and grouping_planner() will compare its cost to the cost of using the
|
|
|
|
cheapest-overall path and doing an explicit sort.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
|
|
|
When we are generating paths for a particular RelOptInfo, we discard a path
|
|
|
|
if it is more expensive than another known path that has the same or better
|
|
|
|
sort order. We will never discard a path that is the only known way to
|
2000-02-07 05:41:04 +01:00
|
|
|
achieve a given sort order (without an explicit sort, that is). In this
|
|
|
|
way, the next level up will have the maximum freedom to build mergejoins
|
|
|
|
without sorting, since it can pick from any of the paths retained for its
|
|
|
|
inputs.
|
1999-08-16 04:17:58 +02:00
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
|
|
|
|
PathKeys
|
|
|
|
--------
|
|
|
|
|
|
|
|
The PathKeys data structure represents what is known about the sort order
|
|
|
|
of a particular Path.
|
|
|
|
|
|
|
|
Path.pathkeys is a List of Lists of PathKeyItem nodes that represent
|
|
|
|
the sort order of the result generated by the Path. The n'th sublist
|
|
|
|
represents the n'th sort key of the result.
|
|
|
|
|
|
|
|
In single/base relation RelOptInfo's, the Paths represent various ways
|
|
|
|
of scanning the relation and the resulting ordering of the tuples.
|
|
|
|
Sequential scan Paths have NIL pathkeys, indicating no known ordering.
|
|
|
|
Index scans have Path.pathkeys that represent the chosen index's ordering,
|
|
|
|
if any. A single-key index would create a pathkey with a single sublist,
|
|
|
|
e.g. ( (tab1.indexkey1/sortop1) ). A multi-key index generates a sublist
|
|
|
|
per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which
|
|
|
|
shows major sort by indexkey1 (ordering by sortop1) and minor sort by
|
|
|
|
indexkey2 with sortop2.
|
|
|
|
|
|
|
|
Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since
|
|
|
|
we can say nothing about the overall order of its result. Also, an
|
|
|
|
indexscan on an unordered type of index generates NIL pathkeys. However,
|
|
|
|
we can always create a pathkey by doing an explicit sort. The pathkeys
|
|
|
|
for a Sort plan's output just represent the sort key fields and the
|
|
|
|
ordering operators used.
|
|
|
|
|
|
|
|
Things get more interesting when we consider joins. Suppose we do a
|
|
|
|
mergejoin between A and B using the mergeclause A.X = B.Y. The output
|
|
|
|
of the mergejoin is sorted by X --- but it is also sorted by Y. We
|
|
|
|
represent this fact by listing both keys in a single pathkey sublist:
|
|
|
|
( (A.X/xsortop B.Y/ysortop) ). This pathkey asserts that the major
|
|
|
|
sort order of the Path can be taken to be *either* A.X or B.Y.
|
|
|
|
They are equal, so they are both primary sort keys. By doing this,
|
|
|
|
we allow future joins to use either var as a pre-sorted key, so upper
|
|
|
|
Mergejoins may be able to avoid having to re-sort the Path. This is
|
|
|
|
why pathkeys is a List of Lists.
|
|
|
|
|
|
|
|
We keep a sortop associated with each PathKeyItem because cross-data-type
|
|
|
|
mergejoins are possible; for example int4 = int8 is mergejoinable.
|
|
|
|
In this case we need to remember that the left var is ordered by int4lt
|
|
|
|
while the right var is ordered by int8lt. So the different members of
|
|
|
|
each sublist could have different sortops.
|
|
|
|
|
|
|
|
Note that while the order of the top list is meaningful (primary vs.
|
|
|
|
secondary sort key), the order of each sublist is arbitrary. Each sublist
|
|
|
|
should be regarded as a set of equivalent keys, with no significance
|
|
|
|
to the list order.
|
|
|
|
|
|
|
|
With a little further thought, it becomes apparent that pathkeys for
|
|
|
|
joins need not only come from mergejoins. For example, if we do a
|
|
|
|
nestloop join between outer relation A and inner relation B, then any
|
|
|
|
pathkeys relevant to A are still valid for the join result: we have
|
|
|
|
not altered the order of the tuples from A. Even more interesting,
|
|
|
|
if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y,
|
|
|
|
and A.X was a pathkey for the outer relation A, then we can assert that
|
|
|
|
B.Y is a pathkey for the join result; X was ordered before and still is,
|
|
|
|
and the joined values of Y are equal to the joined values of X, so Y
|
|
|
|
must now be ordered too. This is true even though we used neither an
|
|
|
|
explicit sort nor a mergejoin on Y.
|
|
|
|
|
|
|
|
More generally, whenever we have an equijoin clause A.X = B.Y and a
|
|
|
|
pathkey A.X, we can add B.Y to that pathkey if B is part of the joined
|
|
|
|
relation the pathkey is for, *no matter how we formed the join*. It works
|
|
|
|
as long as the clause has been applied at some point while forming the
|
|
|
|
join relation. (In the current implementation, we always apply qual
|
|
|
|
clauses as soon as possible, ie, as far down in the plan tree as possible.
|
2000-12-14 23:30:45 +01:00
|
|
|
So we can treat the pathkeys as equivalent everywhere. The exception is
|
|
|
|
when the relations A and B are joined inside the nullable side of an
|
|
|
|
OUTER JOIN and the equijoin clause comes from above the OUTER JOIN. In this
|
|
|
|
case we cannot apply the qual as soon as A and B are joined, so we do not
|
|
|
|
consider the pathkeys to be equivalent. This could be improved if we wanted
|
|
|
|
to go to the trouble of making pathkey equivalence be context-dependent,
|
|
|
|
but that seems much more complex than it's worth.)
|
2000-07-24 05:11:01 +02:00
|
|
|
|
|
|
|
In short, then: when producing the pathkeys for a merge or nestloop join,
|
|
|
|
we can keep all of the keys of the outer path, since the ordering of the
|
|
|
|
outer path will be preserved in the result. Furthermore, we can add to
|
|
|
|
each pathkey sublist any inner vars that are equijoined to any of the
|
|
|
|
outer vars in the sublist; this works regardless of whether we are
|
|
|
|
implementing the join using that equijoin clause as a mergeclause,
|
|
|
|
or merely enforcing the clause after-the-fact as a qpqual filter.
|
|
|
|
|
|
|
|
Although Hashjoins also work only with equijoin operators, it is *not*
|
|
|
|
safe to consider the output of a Hashjoin to be sorted in any particular
|
|
|
|
order --- not even the outer path's order. This is true because the
|
|
|
|
executor might have to split the join into multiple batches. Therefore
|
|
|
|
a Hashjoin is always given NIL pathkeys. (Also, we need to use only
|
|
|
|
mergejoinable operators when deducing which inner vars are now sorted,
|
|
|
|
because a mergejoin operator tells us which left- and right-datatype
|
|
|
|
sortops can be considered equivalent, whereas a hashjoin operator
|
|
|
|
doesn't imply anything about sort order.)
|
|
|
|
|
|
|
|
Pathkeys are also useful to represent an ordering that we wish to achieve,
|
|
|
|
since they are easily compared to the pathkeys of a potential candidate
|
|
|
|
path. So, SortClause lists are turned into pathkeys lists for use inside
|
|
|
|
the optimizer.
|
|
|
|
|
|
|
|
OK, now for how it *really* works:
|
|
|
|
|
|
|
|
We did implement pathkeys just as described above, and found that the
|
|
|
|
planner spent a huge amount of time comparing pathkeys, because the
|
|
|
|
representation of pathkeys as unordered lists made it expensive to decide
|
|
|
|
whether two were equal or not. So, we've modified the representation
|
|
|
|
as described next.
|
|
|
|
|
|
|
|
If we scan the WHERE clause for equijoin clauses (mergejoinable clauses)
|
|
|
|
during planner startup, we can construct lists of equivalent pathkey items
|
|
|
|
for the query. There could be more than two items per equivalence set;
|
|
|
|
for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the
|
|
|
|
equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops).
|
|
|
|
Any pathkey item that belongs to an equivalence set implies that all the
|
|
|
|
other items in its set apply to the relation too, or at least all the ones
|
|
|
|
that are for fields present in the relation. (Some of the items in the
|
|
|
|
set might be for as-yet-unjoined relations.) Furthermore, any multi-item
|
|
|
|
pathkey sublist that appears at any stage of planning the query *must* be
|
|
|
|
a subset of one or another of these equivalence sets; there's no way we'd
|
|
|
|
have put two items in the same pathkey sublist unless they were equijoined
|
|
|
|
in WHERE.
|
|
|
|
|
|
|
|
Now suppose that we allow a pathkey sublist to contain pathkey items for
|
|
|
|
vars that are not yet part of the pathkey's relation. This introduces
|
|
|
|
no logical difficulty, because such items can easily be seen to be
|
|
|
|
irrelevant; we just mandate that they be ignored. But having allowed
|
|
|
|
this, we can declare (by fiat) that any multiple-item pathkey sublist
|
|
|
|
must be "equal()" to the appropriate equivalence set. In effect,
|
|
|
|
whenever we make a pathkey sublist that mentions any var appearing in an
|
|
|
|
equivalence set, we instantly add all the other vars equivalenced to it,
|
|
|
|
whether they appear yet in the pathkey's relation or not. And we also
|
|
|
|
mandate that the pathkey sublist appear in the same order as the
|
2000-12-14 23:30:45 +01:00
|
|
|
equivalence set it comes from.
|
|
|
|
|
|
|
|
In fact, we can go even further, and say that the canonical representation
|
|
|
|
of a pathkey sublist is a pointer directly to the relevant equivalence set,
|
|
|
|
which is kept in a list of pathkey equivalence sets for the query. Then
|
|
|
|
pathkey sublist comparison reduces to pointer-equality checking! To do this
|
|
|
|
we also have to add single-element pathkey sublists to the query's list of
|
|
|
|
equivalence sets, but that's a small price to pay.
|
2000-07-24 05:11:01 +02:00
|
|
|
|
|
|
|
By the way, it's OK and even useful for us to build equivalence sets
|
|
|
|
that mention multiple vars from the same relation. For example, if
|
|
|
|
we have WHERE A.X = A.Y and we are scanning A using an index on X,
|
|
|
|
we can legitimately conclude that the path is sorted by Y as well;
|
|
|
|
and this could be handy if Y is the variable used in other join clauses
|
|
|
|
or ORDER BY. So, any WHERE clause with a mergejoinable operator can
|
|
|
|
contribute to an equivalence set, even if it's not a join clause.
|
|
|
|
|
|
|
|
As sketched so far, equijoin operators allow us to conclude that
|
|
|
|
A.X = B.Y and B.Y = C.Z together imply A.X = C.Z, even when different
|
|
|
|
datatypes are involved. What is not immediately obvious is that to use
|
|
|
|
the "canonical pathkey" representation, we *must* make this deduction.
|
|
|
|
An example (from a real bug in Postgres 7.0) is a mergejoin for a query
|
|
|
|
like
|
|
|
|
SELECT * FROM t1, t2 WHERE t1.f2 = t2.f3 AND t1.f1 = t2.f3;
|
|
|
|
The canonical-pathkey mechanism is able to deduce that t1.f1 = t1.f2
|
|
|
|
(ie, both appear in the same canonical pathkey set). If we sort t1
|
|
|
|
and then apply a mergejoin, we *must* filter the t1 tuples using the
|
|
|
|
implied qualification f1 = f2, because otherwise the output of the sort
|
|
|
|
will be ordered by f1 or f2 (whichever we sort on) but not both. The
|
|
|
|
merge will then fail since (depending on which qual clause it applies
|
|
|
|
first) it's expecting either ORDER BY f1,f2 or ORDER BY f2,f1, but the
|
|
|
|
actual output of the sort has neither of these orderings. The best fix
|
|
|
|
for this is to generate all the implied equality constraints for each
|
|
|
|
equijoin set and add these clauses to the query's qualification list.
|
|
|
|
In other words, we *explicitly* deduce f1 = f2 and add this to the WHERE
|
|
|
|
clause. The constraint will be applied as a qpqual to the output of the
|
|
|
|
scan on t1, resulting in sort output that is indeed ordered by both vars.
|
|
|
|
This approach provides more information to the selectivity estimation
|
|
|
|
code than it would otherwise have, and reduces the number of tuples
|
|
|
|
processed in join stages, so it's a win to make these deductions even
|
|
|
|
if we weren't forced to.
|
|
|
|
|
2001-10-18 18:11:42 +02:00
|
|
|
When we generate implied equality constraints, we may find ourselves
|
|
|
|
adding redundant clauses to specific relations. For example, consider
|
|
|
|
SELECT * FROM t1, t2, t3 WHERE t1.a = t2.b AND t2.b = t3.c;
|
|
|
|
We will generate the implied clause t1.a = t3.c and add it to the tree.
|
|
|
|
This is good since it allows us to consider joining t1 and t3 directly,
|
|
|
|
which we otherwise wouldn't do. But when we reach the stage of joining
|
|
|
|
all three relations, we will have redundant join clauses --- eg, if we
|
|
|
|
join t1 and t2 first, then the path that joins (t1 t2) to t3 will have
|
|
|
|
both t2.b = t3.c and t1.a = t3.c as restriction clauses. This is bad;
|
|
|
|
not only is evaluation of the extra clause useless work at runtime,
|
|
|
|
but the selectivity estimator routines will underestimate the number
|
|
|
|
of tuples produced since they won't know that the two clauses are
|
|
|
|
perfectly redundant. We fix this by detecting and removing redundant
|
|
|
|
clauses as the restriction clause list is built for each join. (We
|
|
|
|
can't do it sooner, since which clauses are redundant will vary depending
|
|
|
|
on the join order.)
|
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
Yet another implication of all this is that mergejoinable operators
|
|
|
|
must form closed equivalence sets. For example, if "int2 = int4"
|
|
|
|
and "int4 = int8" are both marked mergejoinable, then there had better
|
|
|
|
be a mergejoinable "int2 = int8" operator as well. Otherwise, when
|
|
|
|
we're given WHERE int2var = int4var AND int4var = int8var, we'll fail
|
|
|
|
while trying to create a representation of the implied clause
|
|
|
|
int2var = int8var.
|
|
|
|
|
2000-12-14 23:30:45 +01:00
|
|
|
An additional refinement we can make is to insist that canonical pathkey
|
|
|
|
lists (sort orderings) do not mention the same pathkey set more than once.
|
|
|
|
For example, a pathkey list ((A) (B) (A)) is redundant --- the second
|
|
|
|
occurrence of (A) does not change the ordering, since the data must already
|
|
|
|
be sorted by A. Although a user probably wouldn't write ORDER BY A,B,A
|
|
|
|
directly, such redundancies are more probable once equijoin equivalences
|
|
|
|
have been considered. Also, the system is likely to generate redundant
|
|
|
|
pathkey lists when computing the sort ordering needed for a mergejoin. By
|
|
|
|
eliminating the redundancy, we save time and improve planning, since the
|
|
|
|
planner will more easily recognize equivalent orderings as being equivalent.
|
|
|
|
|
2002-08-26 00:39:37 +02:00
|
|
|
Though Bob Devine <bob.devine@worldnet.att.net> was not involved in the
|
|
|
|
coding of our optimizer, he is available to field questions about
|
|
|
|
optimizer topics.
|
|
|
|
|
2000-07-24 05:11:01 +02:00
|
|
|
-- bjm & tgl
|
2002-08-26 00:39:37 +02:00
|
|
|
|