Refactor planner's pathkeys data structure to create a separate, explicit

representation of equivalence classes of variables.  This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
This commit is contained in:
Tom Lane 2007-01-20 20:45:41 +00:00
parent 2b7334d487
commit f41803bb39
35 changed files with 3882 additions and 2719 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/xoper.sgml,v 1.37 2006/12/23 00:43:08 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/xoper.sgml,v 1.38 2007/01/20 20:45:38 tgl Exp $ -->
<sect1 id="xoper">
<title>User-Defined Operators</title>
@ -145,29 +145,29 @@ SELECT (a + b) AS c FROM test_complex;
<itemizedlist>
<listitem>
<para>
One way is to omit the <literal>COMMUTATOR</> clause in the first operator that
you define, and then provide one in the second operator's definition.
Since <productname>PostgreSQL</productname> knows that commutative
operators come in pairs, when it sees the second definition it will
automatically go back and fill in the missing <literal>COMMUTATOR</> clause in
the first definition.
One way is to omit the <literal>COMMUTATOR</> clause in the first operator that
you define, and then provide one in the second operator's definition.
Since <productname>PostgreSQL</productname> knows that commutative
operators come in pairs, when it sees the second definition it will
automatically go back and fill in the missing <literal>COMMUTATOR</> clause in
the first definition.
</para>
</listitem>
<listitem>
<para>
The other, more straightforward way is just to include <literal>COMMUTATOR</> clauses
in both definitions. When <productname>PostgreSQL</productname> processes
the first definition and realizes that <literal>COMMUTATOR</> refers to a nonexistent
operator, the system will make a dummy entry for that operator in the
system catalog. This dummy entry will have valid data only
for the operator name, left and right operand types, and result type,
since that's all that <productname>PostgreSQL</productname> can deduce
at this point. The first operator's catalog entry will link to this
dummy entry. Later, when you define the second operator, the system
updates the dummy entry with the additional information from the second
definition. If you try to use the dummy operator before it's been filled
in, you'll just get an error message.
The other, more straightforward way is just to include <literal>COMMUTATOR</> clauses
in both definitions. When <productname>PostgreSQL</productname> processes
the first definition and realizes that <literal>COMMUTATOR</> refers to a nonexistent
operator, the system will make a dummy entry for that operator in the
system catalog. This dummy entry will have valid data only
for the operator name, left and right operand types, and result type,
since that's all that <productname>PostgreSQL</productname> can deduce
at this point. The first operator's catalog entry will link to this
dummy entry. Later, when you define the second operator, the system
updates the dummy entry with the additional information from the second
definition. If you try to use the dummy operator before it's been filled
in, you'll just get an error message.
</para>
</listitem>
</itemizedlist>
@ -240,7 +240,7 @@ column OP constant
one of the system's standard estimators for many of your own operators.
These are the standard restriction estimators:
<simplelist>
<member><function>eqsel</> for <literal>=</></member>
<member><function>eqsel</> for <literal>=</></member>
<member><function>neqsel</> for <literal>&lt;&gt;</></member>
<member><function>scalarltsel</> for <literal>&lt;</> or <literal>&lt;=</></member>
<member><function>scalargtsel</> for <literal>&gt;</> or <literal>&gt;=</></member>
@ -337,7 +337,7 @@ table1.column1 OP table2.column2
join will never compare them at all, implicitly assuming that the
result of the join operator must be false. So it never makes sense
to specify <literal>HASHES</literal> for operators that do not represent
equality.
some form of equality.
</para>
<para>
@ -347,7 +347,7 @@ table1.column1 OP table2.column2
exist yet. But attempts to use the operator in hash joins will fail
at run time if no such operator family exists. The system needs the
operator family to find the data-type-specific hash function for the
operator's input data type. Of course, you must also supply a suitable
operator's input data type. Of course, you must also create a suitable
hash function before you can create the operator family.
</para>
@ -382,8 +382,9 @@ table1.column1 OP table2.column2
false, never null, for any two nonnull inputs. If this rule is
not followed, hash-optimization of <literal>IN</> operations may
generate wrong results. (Specifically, <literal>IN</> might return
false where the correct answer according to the standard would be null; or it might
yield an error complaining that it wasn't prepared for a null result.)
false where the correct answer according to the standard would be null;
or it might yield an error complaining that it wasn't prepared for a
null result.)
</para>
</note>
@ -407,19 +408,18 @@ table1.column1 OP table2.column2
that can only succeed for pairs of values that fall at the
<quote>same place</>
in the sort order. In practice this means that the join operator must
behave like equality. But unlike hash join, where the left and right
data types had better be the same (or at least bitwise equivalent),
it is possible to merge-join two
behave like equality. But it is possible to merge-join two
distinct data types so long as they are logically compatible. For
example, the <type>smallint</type>-versus-<type>integer</type> equality operator
is merge-joinable.
example, the <type>smallint</type>-versus-<type>integer</type>
equality operator is merge-joinable.
We only need sorting operators that will bring both data types into a
logically compatible sequence.
</para>
<para>
To be marked <literal>MERGES</literal>, the join operator must appear
in a btree index operator family. This is not enforced when you create
as an equality member of a btree index operator family.
This is not enforced when you create
the operator, since of course the referencing operator family couldn't
exist yet. But the operator will not actually be used for merge joins
unless a matching operator family can be found. The
@ -428,30 +428,14 @@ table1.column1 OP table2.column2
</para>
<para>
There are additional restrictions on operators that you mark
merge-joinable. These restrictions are not currently checked by
<command>CREATE OPERATOR</command>, but errors may occur when
the operator is used if any are not true:
<itemizedlist>
<listitem>
<para>
A merge-joinable equality operator must have a merge-joinable
commutator (itself if the two operand data types are the same, or a related
equality operator if they are different).
</para>
</listitem>
<listitem>
<para>
If there is a merge-joinable operator relating any two data types
A and B, and another merge-joinable operator relating B to any
third data type C, then A and C must also have a merge-joinable
operator; in other words, having a merge-joinable operator must
be transitive.
</para>
</listitem>
</itemizedlist>
A merge-joinable operator must have a commutator (itself if the two
operand data types are the same, or a related equality operator
if they are different) that appears in the same operator family.
If this is not the case, planner errors may occur when the operator
is used. Also, it is a good idea (but not strictly required) for
a btree operator family that supports multiple datatypes to provide
equality operators for every combination of the datatypes; this
allows better optimization.
</para>
<note>

View File

@ -15,7 +15,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/copyfuncs.c,v 1.361 2007/01/10 18:06:02 tgl Exp $
* $PostgreSQL: pgsql/src/backend/nodes/copyfuncs.c,v 1.362 2007/01/20 20:45:38 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -1284,16 +1284,18 @@ _copyFromExpr(FromExpr *from)
*/
/*
* _copyPathKeyItem
* _copyPathKey
*/
static PathKeyItem *
_copyPathKeyItem(PathKeyItem *from)
static PathKey *
_copyPathKey(PathKey *from)
{
PathKeyItem *newnode = makeNode(PathKeyItem);
PathKey *newnode = makeNode(PathKey);
COPY_NODE_FIELD(key);
COPY_SCALAR_FIELD(sortop);
COPY_SCALAR_FIELD(nulls_first);
/* EquivalenceClasses are never moved, so just shallow-copy the pointer */
COPY_SCALAR_FIELD(pk_eclass);
COPY_SCALAR_FIELD(pk_opfamily);
COPY_SCALAR_FIELD(pk_strategy);
COPY_SCALAR_FIELD(pk_nulls_first);
return newnode;
}
@ -1316,21 +1318,15 @@ _copyRestrictInfo(RestrictInfo *from)
COPY_BITMAPSET_FIELD(left_relids);
COPY_BITMAPSET_FIELD(right_relids);
COPY_NODE_FIELD(orclause);
/* EquivalenceClasses are never copied, so shallow-copy the pointers */
COPY_SCALAR_FIELD(parent_ec);
COPY_SCALAR_FIELD(eval_cost);
COPY_SCALAR_FIELD(this_selec);
COPY_SCALAR_FIELD(mergejoinoperator);
COPY_SCALAR_FIELD(left_sortop);
COPY_SCALAR_FIELD(right_sortop);
COPY_SCALAR_FIELD(mergeopfamily);
/*
* Do not copy pathkeys, since they'd not be canonical in a copied query
*/
newnode->left_pathkey = NIL;
newnode->right_pathkey = NIL;
COPY_SCALAR_FIELD(left_mergescansel);
COPY_SCALAR_FIELD(right_mergescansel);
COPY_NODE_FIELD(mergeopfamilies);
/* EquivalenceClasses are never copied, so shallow-copy the pointers */
COPY_SCALAR_FIELD(left_ec);
COPY_SCALAR_FIELD(right_ec);
COPY_SCALAR_FIELD(outer_is_left);
COPY_SCALAR_FIELD(hashjoinoperator);
COPY_SCALAR_FIELD(left_bucketsize);
COPY_SCALAR_FIELD(right_bucketsize);
@ -3033,8 +3029,8 @@ copyObject(void *from)
/*
* RELATION NODES
*/
case T_PathKeyItem:
retval = _copyPathKeyItem(from);
case T_PathKey:
retval = _copyPathKey(from);
break;
case T_RestrictInfo:
retval = _copyRestrictInfo(from);

View File

@ -18,7 +18,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/equalfuncs.c,v 1.295 2007/01/10 18:06:03 tgl Exp $
* $PostgreSQL: pgsql/src/backend/nodes/equalfuncs.c,v 1.296 2007/01/20 20:45:38 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -596,11 +596,27 @@ _equalFromExpr(FromExpr *a, FromExpr *b)
*/
static bool
_equalPathKeyItem(PathKeyItem *a, PathKeyItem *b)
_equalPathKey(PathKey *a, PathKey *b)
{
COMPARE_NODE_FIELD(key);
COMPARE_SCALAR_FIELD(sortop);
COMPARE_SCALAR_FIELD(nulls_first);
/*
* This is normally used on non-canonicalized PathKeys, so must chase
* up to the topmost merged EquivalenceClass and see if those are the
* same (by pointer equality).
*/
EquivalenceClass *a_eclass;
EquivalenceClass *b_eclass;
a_eclass = a->pk_eclass;
while (a_eclass->ec_merged)
a_eclass = a_eclass->ec_merged;
b_eclass = b->pk_eclass;
while (b_eclass->ec_merged)
b_eclass = b_eclass->ec_merged;
if (a_eclass != b_eclass)
return false;
COMPARE_SCALAR_FIELD(pk_opfamily);
COMPARE_SCALAR_FIELD(pk_strategy);
COMPARE_SCALAR_FIELD(pk_nulls_first);
return true;
}
@ -2016,8 +2032,8 @@ equal(void *a, void *b)
/*
* RELATION NODES
*/
case T_PathKeyItem:
retval = _equalPathKeyItem(a, b);
case T_PathKey:
retval = _equalPathKey(a, b);
break;
case T_RestrictInfo:
retval = _equalRestrictInfo(a, b);

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.293 2007/01/10 18:06:03 tgl Exp $
* $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.294 2007/01/20 20:45:38 tgl Exp $
*
* NOTES
* Every node type that can appear in stored rules' parsetrees *must*
@ -1196,29 +1196,11 @@ _outNestPath(StringInfo str, NestPath *node)
static void
_outMergePath(StringInfo str, MergePath *node)
{
int numCols;
int i;
WRITE_NODE_TYPE("MERGEPATH");
_outJoinPathInfo(str, (JoinPath *) node);
WRITE_NODE_FIELD(path_mergeclauses);
numCols = list_length(node->path_mergeclauses);
appendStringInfo(str, " :path_mergeFamilies");
for (i = 0; i < numCols; i++)
appendStringInfo(str, " %u", node->path_mergeFamilies[i]);
appendStringInfo(str, " :path_mergeStrategies");
for (i = 0; i < numCols; i++)
appendStringInfo(str, " %d", node->path_mergeStrategies[i]);
appendStringInfo(str, " :path_mergeNullsFirst");
for (i = 0; i < numCols; i++)
appendStringInfo(str, " %d", (int) node->path_mergeNullsFirst[i]);
WRITE_NODE_FIELD(outersortkeys);
WRITE_NODE_FIELD(innersortkeys);
}
@ -1241,7 +1223,8 @@ _outPlannerInfo(StringInfo str, PlannerInfo *node)
/* NB: this isn't a complete set of fields */
WRITE_NODE_FIELD(parse);
WRITE_NODE_FIELD(join_rel_list);
WRITE_NODE_FIELD(equi_key_list);
WRITE_NODE_FIELD(eq_classes);
WRITE_NODE_FIELD(canon_pathkeys);
WRITE_NODE_FIELD(left_join_clauses);
WRITE_NODE_FIELD(right_join_clauses);
WRITE_NODE_FIELD(full_join_clauses);
@ -1284,6 +1267,7 @@ _outRelOptInfo(StringInfo str, RelOptInfo *node)
WRITE_NODE_FIELD(subplan);
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(index_outer_relids);
WRITE_NODE_FIELD(index_inner_paths);
}
@ -1306,13 +1290,48 @@ _outIndexOptInfo(StringInfo str, IndexOptInfo *node)
}
static void
_outPathKeyItem(StringInfo str, PathKeyItem *node)
_outEquivalenceClass(StringInfo str, EquivalenceClass *node)
{
WRITE_NODE_TYPE("PATHKEYITEM");
/*
* To simplify reading, we just chase up to the topmost merged EC and
* print that, without bothering to show the merge-ees separately.
*/
while (node->ec_merged)
node = node->ec_merged;
WRITE_NODE_FIELD(key);
WRITE_OID_FIELD(sortop);
WRITE_BOOL_FIELD(nulls_first);
WRITE_NODE_TYPE("EQUIVALENCECLASS");
WRITE_NODE_FIELD(ec_opfamilies);
WRITE_NODE_FIELD(ec_members);
WRITE_NODE_FIELD(ec_sources);
WRITE_BITMAPSET_FIELD(ec_relids);
WRITE_BOOL_FIELD(ec_has_const);
WRITE_BOOL_FIELD(ec_has_volatile);
WRITE_BOOL_FIELD(ec_below_outer_join);
WRITE_BOOL_FIELD(ec_broken);
}
static void
_outEquivalenceMember(StringInfo str, EquivalenceMember *node)
{
WRITE_NODE_TYPE("EQUIVALENCEMEMBER");
WRITE_NODE_FIELD(em_expr);
WRITE_BITMAPSET_FIELD(em_relids);
WRITE_BOOL_FIELD(em_is_const);
WRITE_BOOL_FIELD(em_is_child);
WRITE_OID_FIELD(em_datatype);
}
static void
_outPathKey(StringInfo str, PathKey *node)
{
WRITE_NODE_TYPE("PATHKEY");
WRITE_NODE_FIELD(pk_eclass);
WRITE_OID_FIELD(pk_opfamily);
WRITE_INT_FIELD(pk_strategy);
WRITE_BOOL_FIELD(pk_nulls_first);
}
static void
@ -1331,12 +1350,11 @@ _outRestrictInfo(StringInfo str, RestrictInfo *node)
WRITE_BITMAPSET_FIELD(left_relids);
WRITE_BITMAPSET_FIELD(right_relids);
WRITE_NODE_FIELD(orclause);
WRITE_OID_FIELD(mergejoinoperator);
WRITE_OID_FIELD(left_sortop);
WRITE_OID_FIELD(right_sortop);
WRITE_OID_FIELD(mergeopfamily);
WRITE_NODE_FIELD(left_pathkey);
WRITE_NODE_FIELD(right_pathkey);
WRITE_NODE_FIELD(parent_ec);
WRITE_NODE_FIELD(mergeopfamilies);
WRITE_NODE_FIELD(left_ec);
WRITE_NODE_FIELD(right_ec);
WRITE_BOOL_FIELD(outer_is_left);
WRITE_OID_FIELD(hashjoinoperator);
}
@ -2163,8 +2181,14 @@ _outNode(StringInfo str, void *obj)
case T_IndexOptInfo:
_outIndexOptInfo(str, obj);
break;
case T_PathKeyItem:
_outPathKeyItem(str, obj);
case T_EquivalenceClass:
_outEquivalenceClass(str, obj);
break;
case T_EquivalenceMember:
_outEquivalenceMember(str, obj);
break;
case T_PathKey:
_outPathKey(str, obj);
break;
case T_RestrictInfo:
_outRestrictInfo(str, obj);

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/print.c,v 1.82 2007/01/05 22:19:30 momjian Exp $
* $PostgreSQL: pgsql/src/backend/nodes/print.c,v 1.83 2007/01/20 20:45:38 tgl Exp $
*
* HISTORY
* AUTHOR DATE MAJOR EVENT
@ -404,7 +404,7 @@ print_expr(Node *expr, List *rtable)
/*
* print_pathkeys -
* pathkeys list of list of PathKeyItems
* pathkeys list of PathKeys
*/
void
print_pathkeys(List *pathkeys, List *rtable)
@ -414,17 +414,26 @@ print_pathkeys(List *pathkeys, List *rtable)
printf("(");
foreach(i, pathkeys)
{
List *pathkey = (List *) lfirst(i);
PathKey *pathkey = (PathKey *) lfirst(i);
EquivalenceClass *eclass;
ListCell *k;
bool first = true;
eclass = pathkey->pk_eclass;
/* chase up, in case pathkey is non-canonical */
while (eclass->ec_merged)
eclass = eclass->ec_merged;
printf("(");
foreach(k, pathkey)
foreach(k, eclass->ec_members)
{
PathKeyItem *item = (PathKeyItem *) lfirst(k);
EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
print_expr(item->key, rtable);
if (lnext(k))
if (first)
first = false;
else
printf(", ");
print_expr((Node *) mem->em_expr, rtable);
}
printf(")");
if (lnext(i))

View File

@ -90,21 +90,19 @@ have a list of relations to join. However, FULL OUTER JOIN clauses are
never flattened, and other kinds of JOIN might not be either, if the
flattening process is stopped by join_collapse_limit or from_collapse_limit
restrictions. Therefore, we end up with a planning problem that contains
both lists of relations to be joined in any order, and JOIN nodes that
force a particular join order. For each un-flattened JOIN node, we join
exactly that pair of relations (after recursively planning their inputs,
if the inputs aren't single base relations). We generate a Path for each
feasible join method, and select the cheapest Path. Note that the JOIN
clause structure determines the join Path structure, but it doesn't
constrain the join implementation method at each join (nestloop, merge,
hash), nor does it say which rel is considered outer or inner at each
join. We consider all these possibilities in building Paths.
lists of relations to be joined in any order, where any individual item
might be a sub-list that has to be joined together before we can consider
joining it to its siblings. We process these sub-problems recursively,
bottom up. Note that the join list structure constrains the possible join
orders, but it doesn't constrain the join implementation method at each
join (nestloop, merge, hash), nor does it say which rel is considered outer
or inner at each join. We consider all these possibilities in building
Paths. We generate a Path for each feasible join method, and select the
cheapest Path.
3) At the top level of the FROM clause we will have a list of relations
that are either base rels or joinrels constructed per un-flattened JOIN
directives. (This is also the situation, recursively, when we can flatten
sub-joins underneath an un-flattenable JOIN into a list of relations to
join.) We can join these rels together in any order the planner sees fit.
For each planning problem, therefore, we will have a list of relations
that are either base rels or joinrels constructed per sub-join-lists.
We can join these rels together in any order the planner sees fit.
The standard (non-GEQO) planner does this as follows:
Consider joining each RelOptInfo to each other RelOptInfo specified in its
@ -114,17 +112,17 @@ choice but to generate a clauseless Cartesian-product join; so we consider
joining that rel to each other available rel. But in the presence of join
clauses we will only consider joins that use available join clauses.)
If we only had two relations in the FROM list, we are done: we just pick
If we only had two relations in the list, we are done: we just pick
the cheapest path for the join RelOptInfo. If we had more than two, we now
need to consider ways of joining join RelOptInfos to each other to make
join RelOptInfos that represent more than two FROM items.
join RelOptInfos that represent more than two list items.
The join tree is constructed using a "dynamic programming" algorithm:
in the first pass (already described) we consider ways to create join rels
representing exactly two FROM items. The second pass considers ways
to make join rels that represent exactly three FROM items; the next pass,
representing exactly two list items. The second pass considers ways
to make join rels that represent exactly three list items; the next pass,
four items, etc. The last pass considers how to make the final join
relation that includes all FROM items --- obviously there can be only one
relation that includes all list items --- obviously there can be only one
join rel at this top level, whereas there can be more than one join rel
at lower levels. At each level we use joins that follow available join
clauses, if possible, just as described for the first level.
@ -155,7 +153,7 @@ For example:
{1 2 3 4}
We consider left-handed plans (the outer rel of an upper join is a joinrel,
but the inner is always a single FROM item); right-handed plans (outer rel
but the inner is always a single list item); right-handed plans (outer rel
is always a single item); and bushy plans (both inner and outer can be
joins themselves). For example, when building {1 2 3 4} we consider
joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
@ -336,7 +334,9 @@ RelOptInfo - a relation or joined relations
MergePath - merge joins
HashPath - hash joins
PathKeys - a data structure representing the ordering of a path
EquivalenceClass - a data structure representing a set of values known equal
PathKey - a data structure representing the sort ordering of a path
The optimizer spends a good deal of its time worrying about the ordering
of the tuples returned by a path. The reason this is useful is that by
@ -363,213 +363,250 @@ without sorting, since it can pick from any of the paths retained for its
inputs.
EquivalenceClasses
------------------
During the deconstruct_jointree() scan of the query's qual clauses, we look
for mergejoinable equality clauses A = B whose applicability is not delayed
by an outer join; these are called "equivalence clauses". When we find
one, we create an EquivalenceClass containing the expressions A and B to
record this knowledge. If we later find another equivalence clause B = C,
we add C to the existing EquivalenceClass for {A B}; this may require
merging two existing EquivalenceClasses. At the end of the scan, we have
sets of values that are known all transitively equal to each other. We can
therefore use a comparison of any pair of the values as a restriction or
join clause (when these values are available at the scan or join, of
course); furthermore, we need test only one such comparison, not all of
them. Therefore, equivalence clauses are removed from the standard qual
distribution process. Instead, when preparing a restriction or join clause
list, we examine each EquivalenceClass to see if it can contribute a
clause, and if so we select an appropriate pair of values to compare. For
example, if we are trying to join A's relation to C's, we can generate the
clause A = C, even though this appeared nowhere explicitly in the original
query. This may allow us to explore join paths that otherwise would have
been rejected as requiring Cartesian-product joins.
Sometimes an EquivalenceClass may contain a pseudo-constant expression
(i.e., one not containing Vars or Aggs of the current query level, nor
volatile functions). In this case we do not follow the policy of
dynamically generating join clauses: instead, we dynamically generate
restriction clauses "var = const" wherever one of the variable members of
the class can first be computed. For example, if we have A = B and B = 42,
we effectively generate the restriction clauses A = 42 and B = 42, and then
we need not bother with explicitly testing the join clause A = B when the
relations are joined. In effect, all the class members can be tested at
relation-scan level and there's never a need for join tests.
The precise technical interpretation of an EquivalenceClass is that it
asserts that at any plan node where more than one of its member values
can be computed, output rows in which the values are not all equal may
be discarded without affecting the query result. (We require all levels
of the plan to enforce EquivalenceClasses, hence a join need not recheck
equality of values that were computable by one of its children.) For an
ordinary EquivalenceClass that is "valid everywhere", we can further infer
that the values are all non-null, because all mergejoinable operators are
strict. However, we also allow equivalence clauses that appear below the
nullable side of an outer join to form EquivalenceClasses; for these
classes, the interpretation is that either all the values are equal, or
all (except pseudo-constants) have gone to null. (This requires a
limitation that non-constant members be strict, else they might not go
to null when the other members do.) Consider for example
SELECT *
FROM a LEFT JOIN
(SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
ON a.x = ss.y
WHERE a.x = 42;
We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed
clauses from forming EquivalenceClasses is exactly that we want to be able
to push any derived clauses as far down as possible.) But once above the
outer join it's no longer necessarily the case that b.y = 10, and thus we
cannot use such EquivalenceClasses to conclude that sorting is unnecessary
(see discussion of PathKeys below).
In this example, notice also that a.x = ss.y (really a.x = b.y) is not an
equivalence clause because its applicability to b is delayed by the outer
join; thus we do not try to insert b.y into the equivalence class {a.x 42}.
But since we see that a.x has been equated to 42 above the outer join, we
are able to form a below-outer-join class {b.y 42}; this restriction can be
added because no b/c row not having b.y = 42 can contribute to the result
of the outer join, and so we need not compute such rows. Now this class
will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42,
which lets the planner deduce that the b/c join need not be computed at all
because none of its rows can contribute to the outer join. (This gets
implemented as a gating Result filter, since more usually the potential
contradiction involves Param values rather than just Consts, and thus has
to be checked at runtime.)
To aid in determining the sort ordering(s) that can work with a mergejoin,
we mark each mergejoinable clause with the EquivalenceClasses of its left
and right inputs. For an equivalence clause, these are of course the same
EquivalenceClass. For a non-equivalence mergejoinable clause (such as an
outer-join qualification), we generate two separate EquivalenceClasses for
the left and right inputs. This may result in creating single-item
equivalence "classes", though of course these are still subject to merging
if other equivalence clauses are later found to bear on the same
expressions.
Another way that we may form a single-item EquivalenceClass is in creation
of a PathKey to represent a desired sort order (see below). This is a bit
different from the above cases because such an EquivalenceClass might
contain an aggregate function or volatile expression. (A clause containing
a volatile function will never be considered mergejoinable, even if its top
operator is mergejoinable, so there is no way for a volatile expression to
get into EquivalenceClasses otherwise. Aggregates are disallowed in WHERE
altogether, so will never be found in a mergejoinable clause.) This is just
a convenience to maintain a uniform PathKey representation: such an
EquivalenceClass will never be merged with any other.
An EquivalenceClass also contains a list of btree opfamily OIDs, which
determines what the equalities it represents actually "mean". All the
equivalence clauses that contribute to an EquivalenceClass must have
equality operators that belong to the same set of opfamilies. (Note: most
of the time, a particular equality operator belongs to only one family, but
it's possible that it belongs to more than one. We keep track of all the
families to ensure that we can make use of an index belonging to any one of
the families for mergejoin purposes.)
PathKeys
--------
The PathKeys data structure represents what is known about the sort order
of a particular Path.
of the tuples generated by a particular Path. A path's pathkeys field is a
list of PathKey nodes, where the n'th item represents the n'th sort key of
the result. Each PathKey contains these fields:
Path.pathkeys is a List of Lists of PathKeyItem nodes that represent
the sort order of the result generated by the Path. The n'th sublist
represents the n'th sort key of the result.
* a reference to an EquivalenceClass
* a btree opfamily OID (must match one of those in the EC)
* a sort direction (ascending or descending)
* a nulls-first-or-last flag
The EquivalenceClass represents the value being sorted on. Since the
various members of an EquivalenceClass are known equal according to the
opfamily, we can consider a path sorted by any one of them to be sorted by
any other too; this is what justifies referencing the whole
EquivalenceClass rather than just one member of it.
In single/base relation RelOptInfo's, the Paths represent various ways
of scanning the relation and the resulting ordering of the tuples.
Sequential scan Paths have NIL pathkeys, indicating no known ordering.
Index scans have Path.pathkeys that represent the chosen index's ordering,
if any. A single-key index would create a pathkey with a single sublist,
e.g. ( (tab1.indexkey1/sortop1) ). A multi-key index generates a sublist
per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which
shows major sort by indexkey1 (ordering by sortop1) and minor sort by
indexkey2 with sortop2.
if any. A single-key index would create a single-PathKey list, while a
multi-column index generates a list with one element per index column.
(Actually, since an index can be scanned either forward or backward, there
are two possible sort orders and two possible PathKey lists it can
generate.)
Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since
we can say nothing about the overall order of its result. Also, an
indexscan on an unordered type of index generates NIL pathkeys. However,
we can always create a pathkey by doing an explicit sort. The pathkeys
for a Sort plan's output just represent the sort key fields and the
ordering operators used.
Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL
pathkeys since we can say nothing about the overall order of its result.
Also, an indexscan on an unordered type of index generates NIL pathkeys.
However, we can always create a pathkey by doing an explicit sort. The
pathkeys for a Sort plan's output just represent the sort key fields and
the ordering operators used.
Things get more interesting when we consider joins. Suppose we do a
mergejoin between A and B using the mergeclause A.X = B.Y. The output
of the mergejoin is sorted by X --- but it is also sorted by Y. We
represent this fact by listing both keys in a single pathkey sublist:
( (A.X/xsortop B.Y/ysortop) ). This pathkey asserts that the major
sort order of the Path can be taken to be *either* A.X or B.Y.
They are equal, so they are both primary sort keys. By doing this,
we allow future joins to use either var as a pre-sorted key, so upper
Mergejoins may be able to avoid having to re-sort the Path. This is
why pathkeys is a List of Lists.
of the mergejoin is sorted by X --- but it is also sorted by Y. Again,
this can be represented by a PathKey referencing an EquivalenceClass
containing both X and Y.
We keep a sortop associated with each PathKeyItem because cross-data-type
mergejoins are possible; for example int4 = int8 is mergejoinable.
In this case we need to remember that the left var is ordered by int4lt
while the right var is ordered by int8lt. So the different members of
each sublist could have different sortops.
Note that while the order of the top list is meaningful (primary vs.
secondary sort key), the order of each sublist is arbitrary. Each sublist
should be regarded as a set of equivalent keys, with no significance
to the list order.
With a little further thought, it becomes apparent that pathkeys for
joins need not only come from mergejoins. For example, if we do a
nestloop join between outer relation A and inner relation B, then any
pathkeys relevant to A are still valid for the join result: we have
not altered the order of the tuples from A. Even more interesting,
if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y,
and A.X was a pathkey for the outer relation A, then we can assert that
B.Y is a pathkey for the join result; X was ordered before and still is,
and the joined values of Y are equal to the joined values of X, so Y
With a little further thought, it becomes apparent that nestloop joins
can also produce sorted output. For example, if we do a nestloop join
between outer relation A and inner relation B, then any pathkeys relevant
to A are still valid for the join result: we have not altered the order of
the tuples from A. Even more interesting, if there was an equivalence clause
A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
that B.Y is a pathkey for the join result; X was ordered before and still
is, and the joined values of Y are equal to the joined values of X, so Y
must now be ordered too. This is true even though we used neither an
explicit sort nor a mergejoin on Y.
explicit sort nor a mergejoin on Y. (Note: hash joins cannot be counted
on to preserve the order of their outer relation, because the executor
might decide to "batch" the join, so we always set pathkeys to NIL for
a hashjoin path.) Exception: a RIGHT or FULL join doesn't preserve the
ordering of its outer relation, because it might insert nulls at random
points in the ordering.
More generally, whenever we have an equijoin clause A.X = B.Y and a
pathkey A.X, we can add B.Y to that pathkey if B is part of the joined
relation the pathkey is for, *no matter how we formed the join*. It works
as long as the clause has been applied at some point while forming the
join relation. (In the current implementation, we always apply qual
clauses as soon as possible, ie, as far down in the plan tree as possible.
So we can treat the pathkeys as equivalent everywhere. The exception is
when the relations A and B are joined inside the nullable side of an
OUTER JOIN and the equijoin clause comes from above the OUTER JOIN. In this
case we cannot apply the qual as soon as A and B are joined, so we do not
consider the pathkeys to be equivalent. This could be improved if we wanted
to go to the trouble of making pathkey equivalence be context-dependent,
but that seems much more complex than it's worth.)
In general, we can justify using EquivalenceClasses as the basis for
pathkeys because, whenever we scan a relation containing multiple
EquivalenceClass members or join two relations each containing
EquivalenceClass members, we apply restriction or join clauses derived from
the EquivalenceClass. This guarantees that any two values listed in the
EquivalenceClass are in fact equal in all tuples emitted by the scan or
join, and therefore that if the tuples are sorted by one of the values,
they can be considered sorted by any other as well. It does not matter
whether the test clause is used as a mergeclause, or merely enforced
after-the-fact as a qpqual filter.
In short, then: when producing the pathkeys for a merge or nestloop join,
we can keep all of the keys of the outer path, since the ordering of the
outer path will be preserved in the result. Furthermore, we can add to
each pathkey sublist any inner vars that are equijoined to any of the
outer vars in the sublist; this works regardless of whether we are
implementing the join using that equijoin clause as a mergeclause,
or merely enforcing the clause after-the-fact as a qpqual filter.
Although Hashjoins also work only with equijoin operators, it is *not*
safe to consider the output of a Hashjoin to be sorted in any particular
order --- not even the outer path's order. This is true because the
executor might have to split the join into multiple batches. Therefore
a Hashjoin is always given NIL pathkeys. (Also, we need to use only
mergejoinable operators when deducing which inner vars are now sorted,
because a mergejoin operator tells us which left- and right-datatype
sortops can be considered equivalent, whereas a hashjoin operator
doesn't imply anything about sort order.)
Note that there is no particular difficulty in labeling a path's sort
order with a PathKey referencing an EquivalenceClass that contains
variables not yet joined into the path's output. We can simply ignore
such entries as not being relevant (yet). This makes it possible to
use the same EquivalenceClasses throughout the join planning process.
In fact, by being careful not to generate multiple identical PathKey
objects, we can reduce comparison of EquivalenceClasses and PathKeys
to simple pointer comparison, which is a huge savings because add_path
has to make a large number of PathKey comparisons in deciding whether
competing Paths are equivalently sorted.
Pathkeys are also useful to represent an ordering that we wish to achieve,
since they are easily compared to the pathkeys of a potential candidate
path. So, SortClause lists are turned into pathkeys lists for use inside
the optimizer.
OK, now for how it *really* works:
We did implement pathkeys just as described above, and found that the
planner spent a huge amount of time comparing pathkeys, because the
representation of pathkeys as unordered lists made it expensive to decide
whether two were equal or not. So, we've modified the representation
as described next.
If we scan the WHERE clause for equijoin clauses (mergejoinable clauses)
during planner startup, we can construct lists of equivalent pathkey items
for the query. There could be more than two items per equivalence set;
for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the
equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops).
Any pathkey item that belongs to an equivalence set implies that all the
other items in its set apply to the relation too, or at least all the ones
that are for fields present in the relation. (Some of the items in the
set might be for as-yet-unjoined relations.) Furthermore, any multi-item
pathkey sublist that appears at any stage of planning the query *must* be
a subset of one or another of these equivalence sets; there's no way we'd
have put two items in the same pathkey sublist unless they were equijoined
in WHERE.
Now suppose that we allow a pathkey sublist to contain pathkey items for
vars that are not yet part of the pathkey's relation. This introduces
no logical difficulty, because such items can easily be seen to be
irrelevant; we just mandate that they be ignored. But having allowed
this, we can declare (by fiat) that any multiple-item pathkey sublist
must be "equal()" to the appropriate equivalence set. In effect,
whenever we make a pathkey sublist that mentions any var appearing in an
equivalence set, we instantly add all the other vars equivalenced to it,
whether they appear yet in the pathkey's relation or not. And we also
mandate that the pathkey sublist appear in the same order as the
equivalence set it comes from.
In fact, we can go even further, and say that the canonical representation
of a pathkey sublist is a pointer directly to the relevant equivalence set,
which is kept in a list of pathkey equivalence sets for the query. Then
pathkey sublist comparison reduces to pointer-equality checking! To do this
we also have to add single-element pathkey sublists to the query's list of
equivalence sets, but that's a small price to pay.
By the way, it's OK and even useful for us to build equivalence sets
that mention multiple vars from the same relation. For example, if
we have WHERE A.X = A.Y and we are scanning A using an index on X,
we can legitimately conclude that the path is sorted by Y as well;
and this could be handy if Y is the variable used in other join clauses
or ORDER BY. So, any WHERE clause with a mergejoinable operator can
contribute to an equivalence set, even if it's not a join clause.
As sketched so far, equijoin operators allow us to conclude that
A.X = B.Y and B.Y = C.Z together imply A.X = C.Z, even when different
datatypes are involved. What is not immediately obvious is that to use
the "canonical pathkey" representation, we *must* make this deduction.
An example (from a real bug in Postgres 7.0) is a mergejoin for a query
like
SELECT * FROM t1, t2 WHERE t1.f2 = t2.f3 AND t1.f1 = t2.f3;
The canonical-pathkey mechanism is able to deduce that t1.f1 = t1.f2
(ie, both appear in the same canonical pathkey set). If we sort t1
and then apply a mergejoin, we *must* filter the t1 tuples using the
implied qualification f1 = f2, because otherwise the output of the sort
will be ordered by f1 or f2 (whichever we sort on) but not both. The
merge will then fail since (depending on which qual clause it applies
first) it's expecting either ORDER BY f1,f2 or ORDER BY f2,f1, but the
actual output of the sort has neither of these orderings. The best fix
for this is to generate all the implied equality constraints for each
equijoin set and add these clauses to the query's qualification list.
In other words, we *explicitly* deduce f1 = f2 and add this to the WHERE
clause. The constraint will be applied as a qpqual to the output of the
scan on t1, resulting in sort output that is indeed ordered by both vars.
This approach provides more information to the selectivity estimation
code than it would otherwise have, and reduces the number of tuples
processed in join stages, so it's a win to make these deductions even
if we weren't forced to.
When we generate implied equality constraints, we may find ourselves
adding redundant clauses to specific relations. For example, consider
SELECT * FROM t1, t2, t3 WHERE t1.a = t2.b AND t2.b = t3.c;
We will generate the implied clause t1.a = t3.c and add it to the tree.
This is good since it allows us to consider joining t1 and t3 directly,
which we otherwise wouldn't do. But when we reach the stage of joining
all three relations, we will have redundant join clauses --- eg, if we
join t1 and t2 first, then the path that joins (t1 t2) to t3 will have
both t2.b = t3.c and t1.a = t3.c as restriction clauses. This is bad;
not only is evaluation of the extra clause useless work at runtime,
but the selectivity estimator routines will underestimate the number
of tuples produced since they won't know that the two clauses are
perfectly redundant. We fix this by detecting and removing redundant
clauses as the restriction clause list is built for each join. (We
can't do it sooner, since which clauses are redundant will vary depending
on the join order.)
Yet another implication of all this is that mergejoinable operators
must form closed equivalence sets. For example, if "int2 = int4"
and "int4 = int8" are both marked mergejoinable, then there had better
be a mergejoinable "int2 = int8" operator as well. Otherwise, when
we're given WHERE int2var = int4var AND int4var = int8var, we'll fail
while trying to create a representation of the implied clause
int2var = int8var.
Because we have to generate pathkeys lists from the sort clauses before
we've finished EquivalenceClass merging, we cannot use the pointer-equality
method of comparing PathKeys in the earliest stages of the planning
process. Instead, we generate "non canonical" PathKeys that reference
single-element EquivalenceClasses that might get merged later. After we
complete EquivalenceClass merging, we replace these with "canonical"
PathKeys that reference only fully-merged classes, and after that we make
sure we don't generate more than one copy of each "canonical" PathKey.
Then it is safe to use pointer comparison on canonical PathKeys.
An additional refinement we can make is to insist that canonical pathkey
lists (sort orderings) do not mention the same pathkey set more than once.
For example, a pathkey list ((A) (B) (A)) is redundant --- the second
occurrence of (A) does not change the ordering, since the data must already
be sorted by A. Although a user probably wouldn't write ORDER BY A,B,A
directly, such redundancies are more probable once equijoin equivalences
have been considered. Also, the system is likely to generate redundant
pathkey lists when computing the sort ordering needed for a mergejoin. By
eliminating the redundancy, we save time and improve planning, since the
planner will more easily recognize equivalent orderings as being equivalent.
lists (sort orderings) do not mention the same EquivalenceClass more than
once. For example, in all these cases the second sort column is redundant,
because it cannot distinguish values that are the same according to the
first sort column:
SELECT ... ORDER BY x, x
SELECT ... ORDER BY x, x DESC
SELECT ... WHERE x = y ORDER BY x, y
Although a user probably wouldn't write "ORDER BY x,x" directly, such
redundancies are more probable once equivalence classes have been
considered. Also, the system may generate redundant pathkey lists when
computing the sort ordering needed for a mergejoin. By eliminating the
redundancy, we save time and improve planning, since the planner will more
easily recognize equivalent orderings as being equivalent.
Another interesting property is that if the underlying EquivalenceClass
contains a constant and is not below an outer join, then the pathkey is
completely redundant and need not be sorted by at all! Every row must
contain the same constant value, so there's no need to sort. (If the EC is
below an outer join, we still have to sort, since some of the rows might
have gone to null and others not. In this case we must be careful to pick
a non-const member to sort by. The assumption that all the non-const
members go to null at the same plan level is critical here, else they might
not produce the same sort order.) This might seem pointless because users
are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
recognize when particular index columns are irrelevant to the sort order:
if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
produces correctly ordered data without a sort step. We used to have very
ugly ad-hoc code to recognize that in limited contexts, but discarding
constant ECs from pathkeys makes it happen cleanly and automatically.
You might object that a below-outer-join EquivalenceClass doesn't always
represent the same values at every level of the join tree, and so using
it to uniquely identify a sort order is dubious. This is true, but we
can avoid dealing with the fact explicitly because we always consider that
an outer join destroys any ordering of its nullable inputs. Thus, even
if a path was sorted by {a.x} below an outer join, we'll re-sort if that
sort ordering was important; and so using the same PathKey for both sort
orderings doesn't create any real problem.
Though Bob Devine <bob.devine@worldnet.att.net> was not involved in the
coding of our optimizer, he is available to field questions about

View File

@ -4,7 +4,7 @@
# Makefile for optimizer/path
#
# IDENTIFICATION
# $PostgreSQL: pgsql/src/backend/optimizer/path/Makefile,v 1.17 2007/01/20 17:16:11 petere Exp $
# $PostgreSQL: pgsql/src/backend/optimizer/path/Makefile,v 1.18 2007/01/20 20:45:38 tgl Exp $
#
#-------------------------------------------------------------------------
@ -12,7 +12,7 @@ subdir = src/backend/optimizer/path
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = allpaths.o clausesel.o costsize.o indxpath.o \
OBJS = allpaths.o clausesel.o costsize.o equivclass.o indxpath.o \
joinpath.o joinrels.o orindxpath.o pathkeys.o tidpath.o
all: SUBSYS.o

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/path/allpaths.c,v 1.156 2007/01/09 02:14:12 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/path/allpaths.c,v 1.157 2007/01/20 20:45:38 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -325,6 +325,16 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
adjust_appendrel_attrs((Node *) rel->joininfo,
appinfo);
/*
* We have to make child entries in the EquivalenceClass data
* structures as well.
*/
if (rel->has_eclass_joins)
{
add_child_rel_equivalences(root, appinfo, rel, childrel);
childrel->has_eclass_joins = true;
}
/*
* Copy the parent's attr_needed data as well, with appropriate
* adjustment of relids and attribute numbers.

View File

@ -54,7 +54,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.174 2007/01/10 18:06:03 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.175 2007/01/20 20:45:38 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -1258,8 +1258,6 @@ cost_mergejoin(MergePath *path, PlannerInfo *root)
Path *outer_path = path->jpath.outerjoinpath;
Path *inner_path = path->jpath.innerjoinpath;
List *mergeclauses = path->path_mergeclauses;
Oid *mergeFamilies = path->path_mergeFamilies;
int *mergeStrategies = path->path_mergeStrategies;
List *outersortkeys = path->outersortkeys;
List *innersortkeys = path->innersortkeys;
Cost startup_cost = 0;
@ -1268,7 +1266,6 @@ cost_mergejoin(MergePath *path, PlannerInfo *root)
Selectivity merge_selec;
QualCost merge_qual_cost;
QualCost qp_qual_cost;
RestrictInfo *firstclause;
double outer_path_rows = PATH_ROWS(outer_path);
double inner_path_rows = PATH_ROWS(inner_path);
double outer_rows,
@ -1347,32 +1344,47 @@ cost_mergejoin(MergePath *path, PlannerInfo *root)
* inputs that will actually need to be scanned. We use only the first
* (most significant) merge clause for this purpose.
*
* Since this calculation is somewhat expensive, and will be the same for
* all mergejoin paths associated with the merge clause, we cache the
* results in the RestrictInfo node. XXX that won't work anymore once
* we support multiple possible orderings!
* XXX mergejoinscansel is a bit expensive, can we cache its results?
*/
if (mergeclauses && path->jpath.jointype != JOIN_FULL)
{
firstclause = (RestrictInfo *) linitial(mergeclauses);
if (firstclause->left_mergescansel < 0) /* not computed yet? */
mergejoinscansel(root, (Node *) firstclause->clause,
mergeFamilies[0],
mergeStrategies[0],
&firstclause->left_mergescansel,
&firstclause->right_mergescansel);
RestrictInfo *firstclause = (RestrictInfo *) linitial(mergeclauses);
List *opathkeys;
List *ipathkeys;
PathKey *opathkey;
PathKey *ipathkey;
Selectivity leftscansel,
rightscansel;
if (bms_is_subset(firstclause->left_relids, outer_path->parent->relids))
/* Get the input pathkeys to determine the sort-order details */
opathkeys = outersortkeys ? outersortkeys : outer_path->pathkeys;
ipathkeys = innersortkeys ? innersortkeys : inner_path->pathkeys;
Assert(opathkeys);
Assert(ipathkeys);
opathkey = (PathKey *) linitial(opathkeys);
ipathkey = (PathKey *) linitial(ipathkeys);
/* debugging check */
if (opathkey->pk_opfamily != ipathkey->pk_opfamily ||
opathkey->pk_strategy != ipathkey->pk_strategy ||
opathkey->pk_nulls_first != ipathkey->pk_nulls_first)
elog(ERROR, "left and right pathkeys do not match in mergejoin");
mergejoinscansel(root, (Node *) firstclause->clause,
opathkey->pk_opfamily, opathkey->pk_strategy,
&leftscansel, &rightscansel);
if (bms_is_subset(firstclause->left_relids,
outer_path->parent->relids))
{
/* left side of clause is outer */
outerscansel = firstclause->left_mergescansel;
innerscansel = firstclause->right_mergescansel;
outerscansel = leftscansel;
innerscansel = rightscansel;
}
else
{
/* left side of clause is inner */
outerscansel = firstclause->right_mergescansel;
innerscansel = firstclause->left_mergescansel;
outerscansel = rightscansel;
innerscansel = leftscansel;
}
if (path->jpath.jointype == JOIN_LEFT)
outerscansel = 1.0;

File diff suppressed because it is too large Load Diff

View File

@ -9,7 +9,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/path/indxpath.c,v 1.215 2007/01/09 02:14:12 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/path/indxpath.c,v 1.216 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -32,7 +32,6 @@
#include "optimizer/var.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
#include "utils/selfuncs.h"
@ -72,21 +71,11 @@ static bool match_rowcompare_to_indexcol(IndexOptInfo *index,
Oid opfamily,
RowCompareExpr *clause,
Relids outer_relids);
static Relids indexable_outerrelids(RelOptInfo *rel);
static Relids indexable_outerrelids(PlannerInfo *root, RelOptInfo *rel);
static bool matches_any_index(RestrictInfo *rinfo, RelOptInfo *rel,
Relids outer_relids);
static List *find_clauses_for_join(PlannerInfo *root, RelOptInfo *rel,
Relids outer_relids, bool isouterjoin);
static ScanDirection match_variant_ordering(PlannerInfo *root,
IndexOptInfo *index,
List *restrictclauses);
static List *identify_ignorable_ordering_cols(PlannerInfo *root,
IndexOptInfo *index,
List *restrictclauses);
static bool match_index_to_query_keys(PlannerInfo *root,
IndexOptInfo *index,
ScanDirection indexscandir,
List *ignorables);
static bool match_boolean_index_clause(Node *clause, int indexcol,
IndexOptInfo *index);
static bool match_special_index_operator(Expr *clause, Oid opfamily,
@ -157,7 +146,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
* participate in such join clauses. We'll use this set later to
* recognize outer rels that are equivalent for joining purposes.
*/
rel->index_outer_relids = indexable_outerrelids(rel);
rel->index_outer_relids = indexable_outerrelids(root, rel);
/*
* Find all the index paths that are directly usable for this relation
@ -351,8 +340,7 @@ find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
if (index_is_ordered && istoplevel && outer_rel == NULL)
{
index_pathkeys = build_index_pathkeys(root, index,
ForwardScanDirection,
true);
ForwardScanDirection);
useful_pathkeys = truncate_useless_pathkeys(root, rel,
index_pathkeys);
}
@ -378,23 +366,21 @@ find_usable_indexes(PlannerInfo *root, RelOptInfo *rel,
}
/*
* 4. If the index is ordered, and there is a requested query ordering
* that we failed to match, consider variant ways of achieving the
* ordering. Again, this is only interesting at top level.
* 4. If the index is ordered, a backwards scan might be
* interesting. Again, this is only interesting at top level.
*/
if (index_is_ordered && istoplevel && outer_rel == NULL &&
root->query_pathkeys != NIL &&
pathkeys_useful_for_ordering(root, useful_pathkeys) == 0)
if (index_is_ordered && istoplevel && outer_rel == NULL)
{
ScanDirection scandir;
scandir = match_variant_ordering(root, index, restrictclauses);
if (!ScanDirectionIsNoMovement(scandir))
index_pathkeys = build_index_pathkeys(root, index,
BackwardScanDirection);
useful_pathkeys = truncate_useless_pathkeys(root, rel,
index_pathkeys);
if (useful_pathkeys != NIL)
{
ipath = create_index_path(root, index,
restrictclauses,
root->query_pathkeys,
scandir,
useful_pathkeys,
BackwardScanDirection,
outer_rel);
result = lappend(result, ipath);
}
@ -1207,19 +1193,6 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
List *restrictinfo_list = rel->baserestrictinfo;
ListCell *ilist;
/*
* Note: if Postgres tried to optimize queries by forming equivalence
* classes over equi-joined attributes (i.e., if it recognized that a
* qualification such as "where a.b=c.d and a.b=5" could make use of an
* index on c.d), then we could use that equivalence class info here with
* joininfo lists to do more complete tests for the usability of a partial
* index. For now, the test only uses restriction clauses (those in
* baserestrictinfo). --Nels, Dec '92
*
* XXX as of 7.1, equivalence class info *is* available. Consider
* improving this code as foreseen by Nels.
*/
foreach(ilist, rel->indexlist)
{
IndexOptInfo *index = (IndexOptInfo *) lfirst(ilist);
@ -1242,18 +1215,19 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
* for the specified table. Returns a set of relids.
*/
static Relids
indexable_outerrelids(RelOptInfo *rel)
indexable_outerrelids(PlannerInfo *root, RelOptInfo *rel)
{
Relids outer_relids = NULL;
ListCell *l;
bool is_child_rel = (rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
ListCell *lc1;
/*
* Examine each joinclause in the joininfo list to see if it matches any
* key of any index. If so, add the clause's other rels to the result.
*/
foreach(l, rel->joininfo)
foreach(lc1, rel->joininfo)
{
RestrictInfo *joininfo = (RestrictInfo *) lfirst(l);
RestrictInfo *joininfo = (RestrictInfo *) lfirst(lc1);
Relids other_rels;
other_rels = bms_difference(joininfo->required_relids, rel->relids);
@ -1263,6 +1237,71 @@ indexable_outerrelids(RelOptInfo *rel)
bms_free(other_rels);
}
/*
* We also have to look through the query's EquivalenceClasses to see
* if any of them could generate indexable join conditions for this rel.
*/
if (rel->has_eclass_joins)
{
foreach(lc1, root->eq_classes)
{
EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
Relids other_rels = NULL;
bool found_index = false;
ListCell *lc2;
/*
* Won't generate joinclauses if const or single-member (the latter
* test covers the volatile case too)
*/
if (cur_ec->ec_has_const || list_length(cur_ec->ec_members) <= 1)
continue;
/*
* Note we don't test ec_broken; if we did, we'd need a separate
* code path to look through ec_sources. Checking the members
* anyway is OK as a possibly-overoptimistic heuristic.
*/
/*
* No point in searching if rel not mentioned in eclass (but we
* can't tell that for a child rel).
*/
if (!is_child_rel &&
!bms_is_subset(rel->relids, cur_ec->ec_relids))
continue;
/*
* Scan members, looking for both an index match and join
* candidates
*/
foreach(lc2, cur_ec->ec_members)
{
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
/* Join candidate? */
if (!cur_em->em_is_child &&
!bms_overlap(cur_em->em_relids, rel->relids))
{
other_rels = bms_add_members(other_rels,
cur_em->em_relids);
continue;
}
/* Check for index match (only need one) */
if (!found_index &&
bms_equal(cur_em->em_relids, rel->relids) &&
eclass_matches_any_index(cur_ec, cur_em, rel))
found_index = true;
}
if (found_index)
outer_relids = bms_join(outer_relids, other_rels);
else
bms_free(other_rels);
}
}
return outer_relids;
}
@ -1339,6 +1378,42 @@ matches_any_index(RestrictInfo *rinfo, RelOptInfo *rel, Relids outer_relids)
return false;
}
/*
* eclass_matches_any_index
* Workhorse for indexable_outerrelids: see if an EquivalenceClass member
* can be matched to any index column of the given rel.
*
* This is also exported for use by find_eclass_clauses_for_index_join.
*/
bool
eclass_matches_any_index(EquivalenceClass *ec, EquivalenceMember *em,
RelOptInfo *rel)
{
ListCell *l;
foreach(l, rel->indexlist)
{
IndexOptInfo *index = (IndexOptInfo *) lfirst(l);
int indexcol = 0;
Oid *families = index->opfamily;
do
{
Oid curFamily = families[0];
if (list_member_oid(ec->ec_opfamilies, curFamily) &&
match_index_to_operand((Node *) em->em_expr, indexcol, index))
return true;
indexcol++;
families++;
} while (!DoneMatchingIndexKeys(families));
}
return false;
}
/*
* best_inner_indexscan
* Finds the best available inner indexscan for a nestloop join
@ -1393,12 +1468,12 @@ best_inner_indexscan(PlannerInfo *root, RelOptInfo *rel,
return NULL;
/*
* Otherwise, we have to do path selection in the memory context of the
* given rel, so that any created path can be safely attached to the rel's
* cache of best inner paths. (This is not currently an issue for normal
* planning, but it is an issue for GEQO planning.)
* Otherwise, we have to do path selection in the main planning context,
* so that any created path can be safely attached to the rel's cache of
* best inner paths. (This is not currently an issue for normal planning,
* but it is an issue for GEQO planning.)
*/
oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
oldcontext = MemoryContextSwitchTo(root->planner_cxt);
/*
* Intersect the given outer relids with index_outer_relids to find the
@ -1539,7 +1614,12 @@ find_clauses_for_join(PlannerInfo *root, RelOptInfo *rel,
Relids join_relids;
ListCell *l;
/* Look for joinclauses that are usable with given outer_relids */
/*
* Look for joinclauses that are usable with given outer_relids. Note
* we'll take anything that's applicable to the join whether it has
* anything to do with an index or not; since we're only building a list,
* it's not worth filtering more finely here.
*/
join_relids = bms_union(rel->relids, outer_relids);
foreach(l, rel->joininfo)
@ -1557,276 +1637,27 @@ find_clauses_for_join(PlannerInfo *root, RelOptInfo *rel,
bms_free(join_relids);
/* if no join clause was matched then forget it, per comments above */
/*
* Also check to see if any EquivalenceClasses can produce a relevant
* joinclause. Since all such clauses are effectively pushed-down,
* this doesn't apply to outer joins.
*/
if (!isouterjoin && rel->has_eclass_joins)
clause_list = list_concat(clause_list,
find_eclass_clauses_for_index_join(root,
rel,
outer_relids));
/* If no join clause was matched then forget it, per comments above */
if (clause_list == NIL)
return NIL;
/*
* We can also use any plain restriction clauses for the rel. We put
* these at the front of the clause list for the convenience of
* remove_redundant_join_clauses, which can never remove non-join clauses
* and hence won't be able to get rid of a non-join clause if it appears
* after a join clause it is redundant with.
*/
/* We can also use any plain restriction clauses for the rel */
clause_list = list_concat(list_copy(rel->baserestrictinfo), clause_list);
/*
* We may now have clauses that are known redundant. Get rid of 'em.
*/
if (list_length(clause_list) > 1)
{
clause_list = remove_redundant_join_clauses(root,
clause_list,
isouterjoin);
}
return clause_list;
}
/****************************************************************************
* ---- ROUTINES TO HANDLE PATHKEYS ----
****************************************************************************/
/*
* match_variant_ordering
* Try to match an index's ordering to the query's requested ordering
*
* This is used when the index is ordered but a naive comparison fails to
* match its ordering (pathkeys) to root->query_pathkeys. It may be that
* we need to scan the index backwards. Also, a less naive comparison can
* help for both forward and backward indexscans. Columns of the index
* that have an equality restriction clause can be ignored in the match;
* that is, an index on (x,y) can be considered to match the ordering of
* ... WHERE x = 42 ORDER BY y;
*
* Note: it would be possible to similarly ignore useless ORDER BY items;
* that is, an index on just y could be considered to match the ordering of
* ... WHERE x = 42 ORDER BY x, y;
* But proving that this is safe would require finding a btree opfamily
* containing both the = operator and the < or > operator in the ORDER BY
* item. That's significantly more expensive than what we do here, since
* we'd have to look at restriction clauses unrelated to the current index
* and search for opfamilies without any hint from the index. The practical
* use-cases seem to be mostly covered by ignoring index columns, so that's
* all we do for now.
*
* Inputs:
* 'index' is the index of interest.
* 'restrictclauses' is the list of sublists of restriction clauses
* matching the columns of the index (NIL if none)
*
* If able to match the requested query pathkeys, returns either
* ForwardScanDirection or BackwardScanDirection to indicate the proper index
* scan direction. If no match, returns NoMovementScanDirection.
*/
static ScanDirection
match_variant_ordering(PlannerInfo *root,
IndexOptInfo *index,
List *restrictclauses)
{
List *ignorables;
/*
* Forget the whole thing if not a btree index; our check for ignorable
* columns assumes we are dealing with btree opfamilies. (It'd be possible
* to factor out just the try for backwards indexscan, but considering
* that we presently have no orderable indexes except btrees anyway, it's
* hardly worth contorting this code for that case.)
*
* Note: if you remove this, you probably need to put in a check on
* amoptionalkey to prevent possible clauseless scan on an index that
* won't cope.
*/
if (index->relam != BTREE_AM_OID)
return NoMovementScanDirection;
/*
* Figure out which index columns can be optionally ignored because they
* have an equality constraint. This is the same set for either forward
* or backward scan, so we do it just once.
*/
ignorables = identify_ignorable_ordering_cols(root, index,
restrictclauses);
/*
* Try to match to forward scan, then backward scan. However, we can skip
* the forward-scan case if there are no ignorable columns, because
* find_usable_indexes() would have found the match already.
*/
if (ignorables &&
match_index_to_query_keys(root, index, ForwardScanDirection,
ignorables))
return ForwardScanDirection;
if (match_index_to_query_keys(root, index, BackwardScanDirection,
ignorables))
return BackwardScanDirection;
return NoMovementScanDirection;
}
/*
* identify_ignorable_ordering_cols
* Determine which index columns can be ignored for ordering purposes
*
* Returns an integer List of column numbers (1-based) of ignorable
* columns. The ignorable columns are those that have equality constraints
* against pseudoconstants.
*/
static List *
identify_ignorable_ordering_cols(PlannerInfo *root,
IndexOptInfo *index,
List *restrictclauses)
{
List *result = NIL;
int indexcol = 0; /* note this is 0-based */
ListCell *l;
/* restrictclauses is either NIL or has a sublist per column */
foreach(l, restrictclauses)
{
List *sublist = (List *) lfirst(l);
Oid opfamily = index->opfamily[indexcol];
ListCell *l2;
foreach(l2, sublist)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l2);
OpExpr *clause = (OpExpr *) rinfo->clause;
Oid clause_op;
int op_strategy;
bool varonleft;
bool ispc;
/* First check for boolean-index cases. */
if (IsBooleanOpfamily(opfamily))
{
if (match_boolean_index_clause((Node *) clause, indexcol,
index))
{
/*
* The clause means either col = TRUE or col = FALSE; we
* do not care which, it's an equality constraint either
* way.
*/
result = lappend_int(result, indexcol + 1);
break;
}
}
/* Otherwise, ignore if not a binary opclause */
if (!is_opclause(clause) || list_length(clause->args) != 2)
continue;
/* Determine left/right sides and check the operator */
clause_op = clause->opno;
if (match_index_to_operand(linitial(clause->args), indexcol,
index))
{
/* clause_op is correct */
varonleft = true;
}
else
{
Assert(match_index_to_operand(lsecond(clause->args), indexcol,
index));
/* Must flip operator to get the opfamily member */
clause_op = get_commutator(clause_op);
varonleft = false;
}
if (!OidIsValid(clause_op))
continue; /* ignore non match, per next comment */
op_strategy = get_op_opfamily_strategy(clause_op, opfamily);
/*
* You might expect to see Assert(op_strategy != 0) here, but you
* won't: the clause might contain a special indexable operator
* rather than an ordinary opfamily member. Currently none of the
* special operators are very likely to expand to an equality
* operator; we do not bother to check, but just assume no match.
*/
if (op_strategy != BTEqualStrategyNumber)
continue;
/* Now check that other side is pseudoconstant */
if (varonleft)
ispc = is_pseudo_constant_clause_relids(lsecond(clause->args),
rinfo->right_relids);
else
ispc = is_pseudo_constant_clause_relids(linitial(clause->args),
rinfo->left_relids);
if (ispc)
{
result = lappend_int(result, indexcol + 1);
break;
}
}
indexcol++;
}
return result;
}
/*
* match_index_to_query_keys
* Check a single scan direction for "intelligent" match to query keys
*
* 'index' is the index of interest.
* 'indexscandir' is the scan direction to consider
* 'ignorables' is an integer list of indexes of ignorable index columns
*
* Returns TRUE on successful match (ie, the query_pathkeys can be considered
* to match this index).
*/
static bool
match_index_to_query_keys(PlannerInfo *root,
IndexOptInfo *index,
ScanDirection indexscandir,
List *ignorables)
{
List *index_pathkeys;
ListCell *index_cell;
int index_col;
ListCell *r;
/* Get the pathkeys that exactly describe the index */
index_pathkeys = build_index_pathkeys(root, index, indexscandir, false);
/*
* Can we match to the query's requested pathkeys? The inner loop skips
* over ignorable index columns while trying to match.
*/
index_cell = list_head(index_pathkeys);
index_col = 0;
foreach(r, root->query_pathkeys)
{
List *rsubkey = (List *) lfirst(r);
for (;;)
{
List *isubkey;
if (index_cell == NULL)
return false;
isubkey = (List *) lfirst(index_cell);
index_cell = lnext(index_cell);
index_col++; /* index_col is now 1-based */
/*
* Since we are dealing with canonicalized pathkeys, pointer
* comparison is sufficient to determine a match.
*/
if (rsubkey == isubkey)
break; /* matched current query pathkey */
if (!list_member_int(ignorables, index_col))
return false; /* definite failure to match */
/* otherwise loop around and try to match to next index col */
}
}
return true;
}
/****************************************************************************
* ---- PATH CREATION UTILITIES ----

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/path/joinpath.c,v 1.110 2007/01/10 18:06:03 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/path/joinpath.c,v 1.111 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -16,7 +16,6 @@
#include <math.h>
#include "access/skey.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@ -40,10 +39,6 @@ static List *select_mergejoin_clauses(RelOptInfo *joinrel,
RelOptInfo *innerrel,
List *restrictlist,
JoinType jointype);
static void build_mergejoin_strat_arrays(List *mergeclauses,
Oid **mergefamilies,
int **mergestrategies,
bool **mergenullsfirst);
/*
@ -205,9 +200,9 @@ sort_inner_and_outer(PlannerInfo *root,
*
* Actually, it's not quite true that every mergeclause ordering will
* generate a different path order, because some of the clauses may be
* redundant. Therefore, what we do is convert the mergeclause list to a
* list of canonical pathkeys, and then consider different orderings of
* the pathkeys.
* partially redundant (refer to the same EquivalenceClasses). Therefore,
* what we do is convert the mergeclause list to a list of canonical
* pathkeys, and then consider different orderings of the pathkeys.
*
* Generating a path for *every* permutation of the pathkeys doesn't seem
* like a winning strategy; the cost in planning time is too high. For
@ -216,76 +211,59 @@ sort_inner_and_outer(PlannerInfo *root,
* mergejoin without re-sorting against any other possible mergejoin
* partner path. But if we've not guessed the right ordering of secondary
* keys, we may end up evaluating clauses as qpquals when they could have
* been done as mergeclauses. We need to figure out a better way. (Two
* possible approaches: look at all the relevant index relations to
* suggest plausible sort orders, or make just one output path and somehow
* mark it as having a sort-order that can be rearranged freely.)
* been done as mergeclauses. (In practice, it's rare that there's more
* than two or three mergeclauses, so expending a huge amount of thought
* on that is probably not worth it.)
*
* The pathkey order returned by select_outer_pathkeys_for_merge() has
* some heuristics behind it (see that function), so be sure to try it
* exactly as-is as well as making variants.
*/
all_pathkeys = make_pathkeys_for_mergeclauses(root,
mergeclause_list,
outerrel);
all_pathkeys = select_outer_pathkeys_for_merge(root,
mergeclause_list,
joinrel);
foreach(l, all_pathkeys)
{
List *front_pathkey = (List *) lfirst(l);
List *cur_pathkeys;
List *cur_mergeclauses;
Oid *mergefamilies;
int *mergestrategies;
bool *mergenullsfirst;
List *outerkeys;
List *innerkeys;
List *merge_pathkeys;
/* Make a pathkey list with this guy first. */
/* Make a pathkey list with this guy first */
if (l != list_head(all_pathkeys))
cur_pathkeys = lcons(front_pathkey,
list_delete_ptr(list_copy(all_pathkeys),
front_pathkey));
outerkeys = lcons(front_pathkey,
list_delete_ptr(list_copy(all_pathkeys),
front_pathkey));
else
cur_pathkeys = all_pathkeys; /* no work at first one... */
outerkeys = all_pathkeys; /* no work at first one... */
/*
* Select mergeclause(s) that match this sort ordering. If we had
* redundant merge clauses then we will get a subset of the original
* clause list. There had better be some match, however...
*/
/* Sort the mergeclauses into the corresponding ordering */
cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
cur_pathkeys,
outerkeys,
true,
mergeclause_list);
Assert(cur_mergeclauses != NIL);
/* Forget it if can't use all the clauses in right/full join */
if (useallclauses &&
list_length(cur_mergeclauses) != list_length(mergeclause_list))
continue;
/* Should have used them all... */
Assert(list_length(cur_mergeclauses) == list_length(mergeclause_list));
/* Build sort pathkeys for the inner side */
innerkeys = make_inner_pathkeys_for_merge(root,
cur_mergeclauses,
outerkeys);
/* Build pathkeys representing output sort order */
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerkeys);
/*
* Build sort pathkeys for both sides.
* And now we can make the path.
*
* Note: it's possible that the cheapest paths will already be sorted
* properly. create_mergejoin_path will detect that case and suppress
* an explicit sort step, so we needn't do so here.
*/
outerkeys = make_pathkeys_for_mergeclauses(root,
cur_mergeclauses,
outerrel);
innerkeys = make_pathkeys_for_mergeclauses(root,
cur_mergeclauses,
innerrel);
/* Build pathkeys representing output sort order. */
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerkeys);
/* Build opfamily info for execution */
build_mergejoin_strat_arrays(cur_mergeclauses,
&mergefamilies,
&mergestrategies,
&mergenullsfirst);
/*
* And now we can make the path.
*/
add_path(joinrel, (Path *)
create_mergejoin_path(root,
joinrel,
@ -295,9 +273,6 @@ sort_inner_and_outer(PlannerInfo *root,
restrictlist,
merge_pathkeys,
cur_mergeclauses,
mergefamilies,
mergestrategies,
mergenullsfirst,
outerkeys,
innerkeys));
}
@ -427,9 +402,6 @@ match_unsorted_outer(PlannerInfo *root,
Path *outerpath = (Path *) lfirst(l);
List *merge_pathkeys;
List *mergeclauses;
Oid *mergefamilies;
int *mergestrategies;
bool *mergenullsfirst;
List *innersortkeys;
List *trialsortkeys;
Path *cheapest_startup_inner;
@ -510,6 +482,7 @@ match_unsorted_outer(PlannerInfo *root,
/* Look for useful mergeclauses (if any) */
mergeclauses = find_mergeclauses_for_pathkeys(root,
outerpath->pathkeys,
true,
mergeclause_list);
/*
@ -532,15 +505,9 @@ match_unsorted_outer(PlannerInfo *root,
continue;
/* Compute the required ordering of the inner path */
innersortkeys = make_pathkeys_for_mergeclauses(root,
mergeclauses,
innerrel);
/* Build opfamily info for execution */
build_mergejoin_strat_arrays(mergeclauses,
&mergefamilies,
&mergestrategies,
&mergenullsfirst);
innersortkeys = make_inner_pathkeys_for_merge(root,
mergeclauses,
outerpath->pathkeys);
/*
* Generate a mergejoin on the basis of sorting the cheapest inner.
@ -557,9 +524,6 @@ match_unsorted_outer(PlannerInfo *root,
restrictlist,
merge_pathkeys,
mergeclauses,
mergefamilies,
mergestrategies,
mergenullsfirst,
NIL,
innersortkeys));
@ -613,18 +577,12 @@ match_unsorted_outer(PlannerInfo *root,
newclauses =
find_mergeclauses_for_pathkeys(root,
trialsortkeys,
false,
mergeclauses);
Assert(newclauses != NIL);
}
else
newclauses = mergeclauses;
/* Build opfamily info for execution */
build_mergejoin_strat_arrays(newclauses,
&mergefamilies,
&mergestrategies,
&mergenullsfirst);
add_path(joinrel, (Path *)
create_mergejoin_path(root,
joinrel,
@ -634,9 +592,6 @@ match_unsorted_outer(PlannerInfo *root,
restrictlist,
merge_pathkeys,
newclauses,
mergefamilies,
mergestrategies,
mergenullsfirst,
NIL,
NIL));
cheapest_total_inner = innerpath;
@ -666,19 +621,13 @@ match_unsorted_outer(PlannerInfo *root,
newclauses =
find_mergeclauses_for_pathkeys(root,
trialsortkeys,
false,
mergeclauses);
Assert(newclauses != NIL);
}
else
newclauses = mergeclauses;
}
/* Build opfamily info for execution */
build_mergejoin_strat_arrays(newclauses,
&mergefamilies,
&mergestrategies,
&mergenullsfirst);
add_path(joinrel, (Path *)
create_mergejoin_path(root,
joinrel,
@ -688,9 +637,6 @@ match_unsorted_outer(PlannerInfo *root,
restrictlist,
merge_pathkeys,
newclauses,
mergefamilies,
mergestrategies,
mergenullsfirst,
NIL,
NIL));
}
@ -909,6 +855,10 @@ best_appendrel_indexscan(PlannerInfo *root, RelOptInfo *rel,
* Select mergejoin clauses that are usable for a particular join.
* Returns a list of RestrictInfo nodes for those clauses.
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
* good for the duration of the current add_paths_to_joinrel() call!
*
* We examine each restrictinfo clause known for the join to see
* if it is mergejoinable and involves vars from the two sub-relations
* currently of interest.
@ -939,7 +889,7 @@ select_mergejoin_clauses(RelOptInfo *joinrel,
continue;
if (!restrictinfo->can_join ||
restrictinfo->mergejoinoperator == InvalidOid)
restrictinfo->mergeopfamilies == NIL)
{
have_nonmergeable_joinclause = true;
continue; /* not mergejoinable */
@ -954,11 +904,13 @@ select_mergejoin_clauses(RelOptInfo *joinrel,
bms_is_subset(restrictinfo->right_relids, innerrel->relids))
{
/* righthand side is inner */
restrictinfo->outer_is_left = true;
}
else if (bms_is_subset(restrictinfo->left_relids, innerrel->relids) &&
bms_is_subset(restrictinfo->right_relids, outerrel->relids))
{
/* lefthand side is inner */
restrictinfo->outer_is_left = false;
}
else
{
@ -966,7 +918,7 @@ select_mergejoin_clauses(RelOptInfo *joinrel,
continue; /* no good for these input relations */
}
result_list = lcons(restrictinfo, result_list);
result_list = lappend(result_list, restrictinfo);
}
/*
@ -995,46 +947,3 @@ select_mergejoin_clauses(RelOptInfo *joinrel,
return result_list;
}
/*
* Temporary hack to build opfamily and strategy info needed for mergejoin
* by the executor. We need to rethink the planner's handling of merge
* planning so that it can deal with multiple possible merge orders, but
* that's not done yet.
*/
static void
build_mergejoin_strat_arrays(List *mergeclauses,
Oid **mergefamilies,
int **mergestrategies,
bool **mergenullsfirst)
{
int nClauses = list_length(mergeclauses);
int i;
ListCell *l;
*mergefamilies = (Oid *) palloc(nClauses * sizeof(Oid));
*mergestrategies = (int *) palloc(nClauses * sizeof(int));
*mergenullsfirst = (bool *) palloc(nClauses * sizeof(bool));
i = 0;
foreach(l, mergeclauses)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
/*
* We do not need to worry about whether the mergeclause will be
* commuted at runtime --- it's the same opfamily either way.
*/
(*mergefamilies)[i] = restrictinfo->mergeopfamily;
/*
* For the moment, strategy must always be LessThan --- see
* hack version of get_op_mergejoin_info
*/
(*mergestrategies)[i] = BTLessStrategyNumber;
/* And we only allow NULLS LAST, too */
(*mergenullsfirst)[i] = false;
i++;
}
}

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/path/joinrels.c,v 1.83 2007/01/05 22:19:31 momjian Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/path/joinrels.c,v 1.84 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -72,7 +72,7 @@ make_rels_by_joins(PlannerInfo *root, int level, List **joinrels)
other_rels = list_head(joinrels[1]); /* consider all initial
* rels */
if (old_rel->joininfo != NIL)
if (old_rel->joininfo != NIL || old_rel->has_eclass_joins)
{
/*
* Note that if all available join clauses for this rel require
@ -152,7 +152,8 @@ make_rels_by_joins(PlannerInfo *root, int level, List **joinrels)
* outer joins --- then we might have to force a bushy outer
* join. See have_relevant_joinclause().
*/
if (old_rel->joininfo == NIL && root->oj_info_list == NIL)
if (old_rel->joininfo == NIL && !old_rel->has_eclass_joins &&
root->oj_info_list == NIL)
continue;
if (k == other_level)
@ -251,8 +252,7 @@ make_rels_by_joins(PlannerInfo *root, int level, List **joinrels)
/*
* make_rels_by_clause_joins
* Build joins between the given relation 'old_rel' and other relations
* that are mentioned within old_rel's joininfo list (i.e., relations
* that participate in join clauses that 'old_rel' also participates in).
* that participate in join clauses that 'old_rel' also participates in.
* The join rel nodes are returned in a list.
*
* 'old_rel' is the relation entry for the relation to be joined

File diff suppressed because it is too large Load Diff

View File

@ -10,7 +10,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/plan/createplan.c,v 1.221 2007/01/10 18:06:03 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/plan/createplan.c,v 1.222 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -121,8 +121,6 @@ static MergeJoin *make_mergejoin(List *tlist,
JoinType jointype);
static Sort *make_sort(PlannerInfo *root, Plan *lefttree, int numCols,
AttrNumber *sortColIdx, Oid *sortOperators, bool *nullsFirst);
static Sort *make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree,
List *pathkeys);
/*
@ -1425,23 +1423,21 @@ create_nestloop_plan(PlannerInfo *root,
* that have to be checked as qpquals at the join node.
*
* We can also remove any join clauses that are redundant with those
* being used in the index scan; prior redundancy checks will not have
* caught this case because the join clauses would never have been put
* in the same joininfo list.
* being used in the index scan; this check is needed because
* find_eclass_clauses_for_index_join() may emit different clauses
* than generate_join_implied_equalities() did.
*
* We can skip this if the index path is an ordinary indexpath and not
* a special innerjoin path.
* a special innerjoin path, since it then wouldn't be using any join
* clauses.
*/
IndexPath *innerpath = (IndexPath *) best_path->innerjoinpath;
if (innerpath->isjoininner)
{
joinrestrictclauses =
select_nonredundant_join_clauses(root,
joinrestrictclauses,
innerpath->indexclauses,
IS_OUTER_JOIN(best_path->jointype));
}
innerpath->indexclauses);
}
else if (IsA(best_path->innerjoinpath, BitmapHeapPath))
{
@ -1471,8 +1467,7 @@ create_nestloop_plan(PlannerInfo *root,
joinrestrictclauses =
select_nonredundant_join_clauses(root,
joinrestrictclauses,
bitmapclauses,
IS_OUTER_JOIN(best_path->jointype));
bitmapclauses);
}
}
@ -1516,7 +1511,21 @@ create_mergejoin_plan(PlannerInfo *root,
List *joinclauses;
List *otherclauses;
List *mergeclauses;
List *outerpathkeys;
List *innerpathkeys;
int nClauses;
Oid *mergefamilies;
int *mergestrategies;
bool *mergenullsfirst;
MergeJoin *join_plan;
int i;
EquivalenceClass *lastoeclass;
EquivalenceClass *lastieclass;
PathKey *opathkey;
PathKey *ipathkey;
ListCell *lc;
ListCell *lop;
ListCell *lip;
/* Get the join qual clauses (in plain expression form) */
/* Any pseudoconstant clauses are ignored here */
@ -1542,7 +1551,8 @@ create_mergejoin_plan(PlannerInfo *root,
/*
* Rearrange mergeclauses, if needed, so that the outer variable is always
* on the left.
* on the left; mark the mergeclause restrictinfos with correct
* outer_is_left status.
*/
mergeclauses = get_switched_clauses(best_path->path_mergeclauses,
best_path->jpath.outerjoinpath->parent->relids);
@ -1564,7 +1574,10 @@ create_mergejoin_plan(PlannerInfo *root,
make_sort_from_pathkeys(root,
outer_plan,
best_path->outersortkeys);
outerpathkeys = best_path->outersortkeys;
}
else
outerpathkeys = best_path->jpath.outerjoinpath->pathkeys;
if (best_path->innersortkeys)
{
@ -1573,7 +1586,86 @@ create_mergejoin_plan(PlannerInfo *root,
make_sort_from_pathkeys(root,
inner_plan,
best_path->innersortkeys);
innerpathkeys = best_path->innersortkeys;
}
else
innerpathkeys = best_path->jpath.innerjoinpath->pathkeys;
/*
* Compute the opfamily/strategy/nullsfirst arrays needed by the executor.
* The information is in the pathkeys for the two inputs, but we need to
* be careful about the possibility of mergeclauses sharing a pathkey
* (compare find_mergeclauses_for_pathkeys()).
*/
nClauses = list_length(mergeclauses);
Assert(nClauses == list_length(best_path->path_mergeclauses));
mergefamilies = (Oid *) palloc(nClauses * sizeof(Oid));
mergestrategies = (int *) palloc(nClauses * sizeof(int));
mergenullsfirst = (bool *) palloc(nClauses * sizeof(bool));
lastoeclass = NULL;
lastieclass = NULL;
opathkey = NULL;
ipathkey = NULL;
lop = list_head(outerpathkeys);
lip = list_head(innerpathkeys);
i = 0;
foreach(lc, best_path->path_mergeclauses)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
EquivalenceClass *oeclass;
EquivalenceClass *ieclass;
/* fetch outer/inner eclass from mergeclause */
Assert(IsA(rinfo, RestrictInfo));
if (rinfo->outer_is_left)
{
oeclass = rinfo->left_ec;
ieclass = rinfo->right_ec;
}
else
{
oeclass = rinfo->right_ec;
ieclass = rinfo->left_ec;
}
Assert(oeclass != NULL);
Assert(ieclass != NULL);
/* should match current or next pathkeys */
/* we check this carefully for debugging reasons */
if (oeclass != lastoeclass)
{
if (!lop)
elog(ERROR, "too few pathkeys for mergeclauses");
opathkey = (PathKey *) lfirst(lop);
lop = lnext(lop);
lastoeclass = opathkey->pk_eclass;
if (oeclass != lastoeclass)
elog(ERROR, "outer pathkeys do not match mergeclause");
}
if (ieclass != lastieclass)
{
if (!lip)
elog(ERROR, "too few pathkeys for mergeclauses");
ipathkey = (PathKey *) lfirst(lip);
lip = lnext(lip);
lastieclass = ipathkey->pk_eclass;
if (ieclass != lastieclass)
elog(ERROR, "inner pathkeys do not match mergeclause");
}
/* pathkeys should match each other too (more debugging) */
if (opathkey->pk_opfamily != ipathkey->pk_opfamily ||
opathkey->pk_strategy != ipathkey->pk_strategy ||
opathkey->pk_nulls_first != ipathkey->pk_nulls_first)
elog(ERROR, "left and right pathkeys do not match in mergejoin");
/* OK, save info for executor */
mergefamilies[i] = opathkey->pk_opfamily;
mergestrategies[i] = opathkey->pk_strategy;
mergenullsfirst[i] = opathkey->pk_nulls_first;
i++;
}
/*
* Now we can build the mergejoin node.
@ -1582,9 +1674,9 @@ create_mergejoin_plan(PlannerInfo *root,
joinclauses,
otherclauses,
mergeclauses,
best_path->path_mergeFamilies,
best_path->path_mergeStrategies,
best_path->path_mergeNullsFirst,
mergefamilies,
mergestrategies,
mergenullsfirst,
outer_plan,
inner_plan,
best_path->jpath.jointype);
@ -1921,8 +2013,9 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index, Oid *opfamily)
* Given a list of merge or hash joinclauses (as RestrictInfo nodes),
* extract the bare clauses, and rearrange the elements within the
* clauses, if needed, so the outer join variable is on the left and
* the inner is on the right. The original data structure is not touched;
* a modified list is returned.
* the inner is on the right. The original clause data structure is not
* touched; a modified list is returned. We do, however, set the transient
* outer_is_left field in each RestrictInfo to show which side was which.
*/
static List *
get_switched_clauses(List *clauses, Relids outerrelids)
@ -1953,9 +2046,14 @@ get_switched_clauses(List *clauses, Relids outerrelids)
/* Commute it --- note this modifies the temp node in-place. */
CommuteOpExpr(temp);
t_list = lappend(t_list, temp);
restrictinfo->outer_is_left = false;
}
else
{
Assert(bms_is_subset(restrictinfo->left_relids, outerrelids));
t_list = lappend(t_list, clause);
restrictinfo->outer_is_left = true;
}
}
return t_list;
}
@ -2490,7 +2588,7 @@ add_sort_column(AttrNumber colIdx, Oid sortOp, bool nulls_first,
* If the input plan type isn't one that can do projections, this means
* adding a Result node just to do the projection.
*/
static Sort *
Sort *
make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys)
{
List *tlist = lefttree->targetlist;
@ -2512,41 +2610,55 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys)
foreach(i, pathkeys)
{
List *keysublist = (List *) lfirst(i);
PathKeyItem *pathkey = NULL;
PathKey *pathkey = (PathKey *) lfirst(i);
TargetEntry *tle = NULL;
Oid pk_datatype = InvalidOid;
Oid sortop;
ListCell *j;
/*
* We can sort by any one of the sort key items listed in this
* sublist. For now, we take the first one that corresponds to an
* available Var in the tlist. If there isn't any, use the first one
* that is an expression in the input's vars.
* We can sort by any non-constant expression listed in the pathkey's
* EquivalenceClass. For now, we take the first one that corresponds
* to an available Var in the tlist. If there isn't any, use the first
* one that is an expression in the input's vars. (The non-const
* restriction only matters if the EC is below_outer_join; but if it
* isn't, it won't contain consts anyway, else we'd have discarded
* the pathkey as redundant.)
*
* XXX if we have a choice, is there any way of figuring out which
* might be cheapest to execute? (For example, int4lt is likely much
* cheaper to execute than numericlt, but both might appear in the
* same pathkey sublist...) Not clear that we ever will have a choice
* in practice, so it may not matter.
* same equivalence class...) Not clear that we ever will have an
* interesting choice in practice, so it may not matter.
*/
foreach(j, keysublist)
foreach(j, pathkey->pk_eclass->ec_members)
{
pathkey = (PathKeyItem *) lfirst(j);
Assert(IsA(pathkey, PathKeyItem));
tle = tlist_member(pathkey->key, tlist);
EquivalenceMember *em = (EquivalenceMember *) lfirst(j);
if (em->em_is_const || em->em_is_child)
continue;
tle = tlist_member((Node *) em->em_expr, tlist);
if (tle)
break;
{
pk_datatype = em->em_datatype;
break; /* found expr already in tlist */
}
}
if (!tle)
{
/* No matching Var; look for a computable expression */
foreach(j, keysublist)
Expr *sortexpr = NULL;
foreach(j, pathkey->pk_eclass->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(j);
List *exprvars;
ListCell *k;
pathkey = (PathKeyItem *) lfirst(j);
exprvars = pull_var_clause(pathkey->key, false);
if (em->em_is_const || em->em_is_child)
continue;
sortexpr = em->em_expr;
exprvars = pull_var_clause((Node *) sortexpr, false);
foreach(k, exprvars)
{
if (!tlist_member(lfirst(k), tlist))
@ -2554,7 +2666,10 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys)
}
list_free(exprvars);
if (!k)
{
pk_datatype = em->em_datatype;
break; /* found usable expression */
}
}
if (!j)
elog(ERROR, "could not find pathkey item to sort");
@ -2571,7 +2686,7 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys)
/*
* Add resjunk entry to input's tlist
*/
tle = makeTargetEntry((Expr *) pathkey->key,
tle = makeTargetEntry(sortexpr,
list_length(tlist) + 1,
NULL,
true);
@ -2579,14 +2694,28 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys)
lefttree->targetlist = tlist; /* just in case NIL before */
}
/*
* Look up the correct sort operator from the PathKey's slightly
* abstracted representation.
*/
sortop = get_opfamily_member(pathkey->pk_opfamily,
pk_datatype,
pk_datatype,
pathkey->pk_strategy);
if (!OidIsValid(sortop)) /* should not happen */
elog(ERROR, "could not find member %d(%u,%u) of opfamily %u",
pathkey->pk_strategy, pk_datatype, pk_datatype,
pathkey->pk_opfamily);
/*
* The column might already be selected as a sort key, if the pathkeys
* contain duplicate entries. (This can happen in scenarios where
* multiple mergejoinable clauses mention the same var, for example.)
* So enter it only once in the sort arrays.
*/
numsortkeys = add_sort_column(tle->resno, pathkey->sortop,
pathkey->nulls_first,
numsortkeys = add_sort_column(tle->resno,
sortop,
pathkey->pk_nulls_first,
numsortkeys,
sortColIdx, sortOperators, nullsFirst);
}

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/plan/initsplan.c,v 1.127 2007/01/08 16:47:30 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/plan/initsplan.c,v 1.128 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -37,8 +37,6 @@ int from_collapse_limit;
int join_collapse_limit;
static void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
bool below_outer_join, Relids *qualscope);
static OuterJoinInfo *make_outerjoininfo(PlannerInfo *root,
@ -51,8 +49,7 @@ static void distribute_qual_to_rels(PlannerInfo *root, Node *clause,
Relids qualscope,
Relids ojscope,
Relids outerjoin_nonnullable);
static bool qual_is_redundant(PlannerInfo *root, RestrictInfo *restrictinfo,
List *restrictlist);
static bool check_outerjoin_delay(PlannerInfo *root, Relids *relids_p);
static void check_mergejoinable(RestrictInfo *restrictinfo);
static void check_hashjoinable(RestrictInfo *restrictinfo);
@ -144,7 +141,7 @@ build_base_rel_tlists(PlannerInfo *root, List *final_tlist)
* as being needed for the indicated join (or for final output if
* where_needed includes "relation 0").
*/
static void
void
add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed)
{
ListCell *temp;
@ -590,17 +587,17 @@ make_outerjoininfo(PlannerInfo *root,
* Add clause information to either the baserestrictinfo or joininfo list
* (depending on whether the clause is a join) of each base relation
* mentioned in the clause. A RestrictInfo node is created and added to
* the appropriate list for each rel. Also, if the clause uses a
* the appropriate list for each rel. Alternatively, if the clause uses a
* mergejoinable operator and is not delayed by outer-join rules, enter
* the left- and right-side expressions into the query's lists of
* equijoined vars.
* the left- and right-side expressions into the query's list of
* EquivalenceClasses.
*
* 'clause': the qual clause to be distributed
* 'is_pushed_down': if TRUE, force the clause to be marked 'is_pushed_down'
* (this indicates the clause came from a FromExpr, not a JoinExpr)
* 'is_deduced': TRUE if the qual came from implied-equality deduction
* 'below_outer_join': TRUE if the qual is from a JOIN/ON that is below the
* nullable side of a higher-level outer join.
* nullable side of a higher-level outer join
* 'qualscope': set of baserels the qual's syntactic scope covers
* 'ojscope': NULL if not an outer-join qual, else the minimum set of baserels
* needed to form this join
@ -625,11 +622,9 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
Relids relids;
bool outerjoin_delayed;
bool pseudoconstant = false;
bool maybe_equijoin;
bool maybe_equivalence;
bool maybe_outer_join;
RestrictInfo *restrictinfo;
RelOptInfo *rel;
List *vars;
/*
* Retrieve all relids mentioned within the clause.
@ -705,108 +700,57 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
if (is_deduced)
{
/*
* If the qual came from implied-equality deduction, we always
* evaluate the qual at its natural semantic level. It is the
* responsibility of the deducer not to create any quals that should
* be delayed by outer-join rules.
* If the qual came from implied-equality deduction, it should
* not be outerjoin-delayed, else deducer blew it. But we can't
* check this because the ojinfo list may now contain OJs above
* where the qual belongs.
*/
Assert(bms_equal(relids, qualscope));
Assert(!ojscope);
Assert(!pseudoconstant);
/* Needn't feed it back for more deductions */
outerjoin_delayed = false;
maybe_equijoin = false;
/* Don't feed it back for more deductions */
maybe_equivalence = false;
maybe_outer_join = false;
}
else if (bms_overlap(relids, outerjoin_nonnullable))
{
/*
* The qual is attached to an outer join and mentions (some of the)
* rels on the nonnullable side. Force the qual to be evaluated
* exactly at the level of joining corresponding to the outer join. We
* cannot let it get pushed down into the nonnullable side, since then
* we'd produce no output rows, rather than the intended single
* null-extended row, for any nonnullable-side rows failing the qual.
* rels on the nonnullable side.
*
* Note: an outer-join qual that mentions only nullable-side rels can
* be pushed down into the nullable side without changing the join
* result, so we treat it the same as an ordinary inner-join qual,
* except for not setting maybe_equijoin (see below).
* result, so we treat it almost the same as an ordinary inner-join
* qual (see below).
*
* We can't use such a clause to deduce equivalence (the left and right
* sides might be unequal above the join because one of them has gone
* to NULL) ... but we might be able to use it for more limited
* deductions, if there are no lower outer joins that delay its
* application. If so, consider adding it to the lists of set-aside
* clauses.
*/
maybe_equivalence = false;
maybe_outer_join = !check_outerjoin_delay(root, &relids);
/*
* Now force the qual to be evaluated exactly at the level of joining
* corresponding to the outer join. We cannot let it get pushed down
* into the nonnullable side, since then we'd produce no output rows,
* rather than the intended single null-extended row, for any
* nonnullable-side rows failing the qual.
*
* (Do this step after calling check_outerjoin_delay, because that
* trashes relids.)
*/
Assert(ojscope);
relids = ojscope;
outerjoin_delayed = true;
Assert(!pseudoconstant);
/*
* We can't use such a clause to deduce equijoin (the left and right
* sides might be unequal above the join because one of them has gone
* to NULL) ... but we might be able to use it for more limited
* purposes. Note: for the current uses of deductions from an
* outer-join clause, it seems safe to make the deductions even when
* the clause is below a higher-level outer join; so we do not check
* below_outer_join here.
*/
maybe_equijoin = false;
maybe_outer_join = true;
}
else
{
/*
* For a non-outer-join qual, we can evaluate the qual as soon as (1)
* we have all the rels it mentions, and (2) we are at or above any
* outer joins that can null any of these rels and are below the
* syntactic location of the given qual. We must enforce (2) because
* pushing down such a clause below the OJ might cause the OJ to emit
* null-extended rows that should not have been formed, or that should
* have been rejected by the clause. (This is only an issue for
* non-strict quals, since if we can prove a qual mentioning only
* nullable rels is strict, we'd have reduced the outer join to an
* inner join in reduce_outer_joins().)
*
* To enforce (2), scan the oj_info_list and merge the required-relid
* sets of any such OJs into the clause's own reference list. At the
* time we are called, the oj_info_list contains only outer joins
* below this qual. We have to repeat the scan until no new relids
* get added; this ensures that the qual is suitably delayed regardless
* of the order in which OJs get executed. As an example, if we have
* one OJ with LHS=A, RHS=B, and one with LHS=B, RHS=C, it is implied
* that these can be done in either order; if the B/C join is done
* first then the join to A can null C, so a qual actually mentioning
* only C cannot be applied below the join to A.
*/
bool found_some;
outerjoin_delayed = false;
do {
ListCell *l;
found_some = false;
foreach(l, root->oj_info_list)
{
OuterJoinInfo *ojinfo = (OuterJoinInfo *) lfirst(l);
/* do we have any nullable rels of this OJ? */
if (bms_overlap(relids, ojinfo->min_righthand) ||
(ojinfo->is_full_join &&
bms_overlap(relids, ojinfo->min_lefthand)))
{
/* yes; do we have all its rels? */
if (!bms_is_subset(ojinfo->min_lefthand, relids) ||
!bms_is_subset(ojinfo->min_righthand, relids))
{
/* no, so add them in */
relids = bms_add_members(relids,
ojinfo->min_lefthand);
relids = bms_add_members(relids,
ojinfo->min_righthand);
outerjoin_delayed = true;
/* we'll need another iteration */
found_some = true;
}
}
}
} while (found_some);
/* Normal qual clause; check to see if must be delayed by outer join */
outerjoin_delayed = check_outerjoin_delay(root, &relids);
if (outerjoin_delayed)
{
@ -816,26 +760,27 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
* Because application of the qual will be delayed by outer join,
* we mustn't assume its vars are equal everywhere.
*/
maybe_equijoin = false;
maybe_equivalence = false;
}
else
{
/*
* Qual is not delayed by any lower outer-join restriction. If it
* is not itself below or within an outer join, we can consider it
* "valid everywhere", so consider feeding it to the equijoin
* machinery. (If it is within an outer join, we can't consider
* it "valid everywhere": once the contained variables have gone
* to NULL, we'd be asserting things like NULL = NULL, which is
* not true.)
* Qual is not delayed by any lower outer-join restriction, so
* we can consider feeding it to the equivalence machinery.
* However, if it's itself within an outer-join clause, treat it
* as though it appeared below that outer join (note that we can
* only get here when the clause references only nullable-side
* rels).
*/
if (!below_outer_join && outerjoin_nonnullable == NULL)
maybe_equijoin = true;
else
maybe_equijoin = false;
maybe_equivalence = true;
if (outerjoin_nonnullable != NULL)
below_outer_join = true;
}
/* Since it doesn't mention the LHS, it's certainly not an OJ clause */
/*
* Since it doesn't mention the LHS, it's certainly not useful as a
* set-aside OJ clause, even if it's in an OJ.
*/
maybe_outer_join = false;
}
@ -860,118 +805,65 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
relids);
/*
* Figure out where to attach it.
* If it's a join clause (either naturally, or because delayed by
* outer-join rules), add vars used in the clause to targetlists of
* their relations, so that they will be emitted by the plan nodes that
* scan those relations (else they won't be available at the join node!).
*
* Note: if the clause gets absorbed into an EquivalenceClass then this
* may be unnecessary, but for now we have to do it to cover the case
* where the EC becomes ec_broken and we end up reinserting the original
* clauses into the plan.
*/
switch (bms_membership(relids))
if (bms_membership(relids) == BMS_MULTIPLE)
{
case BMS_SINGLETON:
List *vars = pull_var_clause(clause, false);
/*
* There is only one relation participating in 'clause', so
* 'clause' is a restriction clause for that relation.
*/
rel = find_base_rel(root, bms_singleton_member(relids));
/*
* Check for a "mergejoinable" clause even though it's not a join
* clause. This is so that we can recognize that "a.x = a.y"
* makes x and y eligible to be considered equal, even when they
* belong to the same rel. Without this, we would not recognize
* that "a.x = a.y AND a.x = b.z AND a.y = c.q" allows us to
* consider z and q equal after their rels are joined.
*/
check_mergejoinable(restrictinfo);
/*
* If the clause was deduced from implied equality, check to see
* whether it is redundant with restriction clauses we already
* have for this rel. Note we cannot apply this check to
* user-written clauses, since we haven't found the canonical
* pathkey sets yet while processing user clauses. (NB: no
* comparable check is done in the join-clause case; redundancy
* will be detected when the join clause is moved into a join
* rel's restriction list.)
*/
if (!is_deduced ||
!qual_is_redundant(root, restrictinfo,
rel->baserestrictinfo))
{
/* Add clause to rel's restriction list */
rel->baserestrictinfo = lappend(rel->baserestrictinfo,
restrictinfo);
}
break;
case BMS_MULTIPLE:
/*
* 'clause' is a join clause, since there is more than one rel in
* the relid set.
*/
/*
* Check for hash or mergejoinable operators.
*
* We don't bother setting the hashjoin info if we're not going to
* need it. We do want to know about mergejoinable ops in all
* cases, however, because we use mergejoinable ops for other
* purposes such as detecting redundant clauses.
*/
check_mergejoinable(restrictinfo);
if (enable_hashjoin)
check_hashjoinable(restrictinfo);
/*
* Add clause to the join lists of all the relevant relations.
*/
add_join_clause_to_rels(root, restrictinfo, relids);
/*
* Add vars used in the join clause to targetlists of their
* relations, so that they will be emitted by the plan nodes that
* scan those relations (else they won't be available at the join
* node!).
*/
vars = pull_var_clause(clause, false);
add_vars_to_targetlist(root, vars, relids);
list_free(vars);
break;
default:
/*
* 'clause' references no rels, and therefore we have no place to
* attach it. Shouldn't get here if callers are working properly.
*/
elog(ERROR, "cannot cope with variable-free clause");
break;
add_vars_to_targetlist(root, vars, relids);
list_free(vars);
}
/*
* If the clause has a mergejoinable operator, we may be able to deduce
* more things from it under the principle of transitivity.
* We check "mergejoinability" of every clause, not only join clauses,
* because we want to know about equivalences between vars of the same
* relation, or between vars and consts.
*/
check_mergejoinable(restrictinfo);
/*
* If it is a true equivalence clause, send it to the EquivalenceClass
* machinery. We do *not* attach it directly to any restriction or join
* lists. The EC code will propagate it to the appropriate places later.
*
* If it is not an outer-join qualification nor bubbled up due to an outer
* join, then the two sides represent equivalent PathKeyItems for path
* keys: any path that is sorted by one side will also be sorted by the
* other (as soon as the two rels are joined, that is). Pass such clauses
* to add_equijoined_keys.
* If the clause has a mergejoinable operator and is not outerjoin-delayed,
* yet isn't an equivalence because it is an outer-join clause, the EC
* code may yet be able to do something with it. We add it to appropriate
* lists for further consideration later. Specifically:
*
* If it is a left or right outer-join qualification that relates the two
* sides of the outer join (no funny business like leftvar1 = leftvar2 +
* rightvar), we add it to root->left_join_clauses or
* If it is a left or right outer-join qualification that relates the
* two sides of the outer join (no funny business like leftvar1 =
* leftvar2 + rightvar), we add it to root->left_join_clauses or
* root->right_join_clauses according to which side the nonnullable
* variable appears on.
*
* If it is a full outer-join qualification, we add it to
* root->full_join_clauses. (Ideally we'd discard cases that aren't
* leftvar = rightvar, as we do for left/right joins, but this routine
* doesn't have the info needed to do that; and the current usage of the
* full_join_clauses list doesn't require that, so it's not currently
* worth complicating this routine's API to make it possible.)
* doesn't have the info needed to do that; and the current usage of
* the full_join_clauses list doesn't require that, so it's not
* currently worth complicating this routine's API to make it possible.)
*
* If none of the above hold, pass it off to
* distribute_restrictinfo_to_rels().
*/
if (restrictinfo->mergejoinoperator != InvalidOid)
if (restrictinfo->mergeopfamilies)
{
if (maybe_equijoin)
add_equijoined_keys(root, restrictinfo);
if (maybe_equivalence)
{
if (process_equivalence(root, restrictinfo, below_outer_join))
return;
/* EC rejected it, so pass to distribute_restrictinfo_to_rels */
}
else if (maybe_outer_join && restrictinfo->can_join)
{
if (bms_is_subset(restrictinfo->left_relids,
@ -982,8 +874,9 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
/* we have outervar = innervar */
root->left_join_clauses = lappend(root->left_join_clauses,
restrictinfo);
return;
}
else if (bms_is_subset(restrictinfo->right_relids,
if (bms_is_subset(restrictinfo->right_relids,
outerjoin_nonnullable) &&
!bms_overlap(restrictinfo->left_relids,
outerjoin_nonnullable))
@ -991,166 +884,213 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
/* we have innervar = outervar */
root->right_join_clauses = lappend(root->right_join_clauses,
restrictinfo);
return;
}
else if (bms_equal(outerjoin_nonnullable, qualscope))
if (bms_equal(outerjoin_nonnullable, qualscope))
{
/* FULL JOIN (above tests cannot match in this case) */
root->full_join_clauses = lappend(root->full_join_clauses,
restrictinfo);
return;
}
}
}
/* No EC special case applies, so push it into the clause lists */
distribute_restrictinfo_to_rels(root, restrictinfo);
}
/*
* check_outerjoin_delay
* Detect whether a qual referencing the given relids must be delayed
* in application due to the presence of a lower outer join.
*
* If so, add relids to *relids_p to reflect the lowest safe level for
* evaluating the qual, and return TRUE.
*
* For a non-outer-join qual, we can evaluate the qual as soon as (1) we have
* all the rels it mentions, and (2) we are at or above any outer joins that
* can null any of these rels and are below the syntactic location of the
* given qual. We must enforce (2) because pushing down such a clause below
* the OJ might cause the OJ to emit null-extended rows that should not have
* been formed, or that should have been rejected by the clause. (This is
* only an issue for non-strict quals, since if we can prove a qual mentioning
* only nullable rels is strict, we'd have reduced the outer join to an inner
* join in reduce_outer_joins().)
*
* To enforce (2), scan the oj_info_list and merge the required-relid sets of
* any such OJs into the clause's own reference list. At the time we are
* called, the oj_info_list contains only outer joins below this qual. We
* have to repeat the scan until no new relids get added; this ensures that
* the qual is suitably delayed regardless of the order in which OJs get
* executed. As an example, if we have one OJ with LHS=A, RHS=B, and one with
* LHS=B, RHS=C, it is implied that these can be done in either order; if the
* B/C join is done first then the join to A can null C, so a qual actually
* mentioning only C cannot be applied below the join to A.
*
* For an outer-join qual, this isn't going to determine where we place the
* qual, but we need to determine outerjoin_delayed anyway so we can decide
* whether the qual is potentially useful for equivalence deductions.
*/
static bool
check_outerjoin_delay(PlannerInfo *root, Relids *relids_p)
{
Relids relids = *relids_p;
bool outerjoin_delayed;
bool found_some;
outerjoin_delayed = false;
do {
ListCell *l;
found_some = false;
foreach(l, root->oj_info_list)
{
OuterJoinInfo *ojinfo = (OuterJoinInfo *) lfirst(l);
/* do we reference any nullable rels of this OJ? */
if (bms_overlap(relids, ojinfo->min_righthand) ||
(ojinfo->is_full_join &&
bms_overlap(relids, ojinfo->min_lefthand)))
{
/* yes; have we included all its rels in relids? */
if (!bms_is_subset(ojinfo->min_lefthand, relids) ||
!bms_is_subset(ojinfo->min_righthand, relids))
{
/* no, so add them in */
relids = bms_add_members(relids, ojinfo->min_lefthand);
relids = bms_add_members(relids, ojinfo->min_righthand);
outerjoin_delayed = true;
/* we'll need another iteration */
found_some = true;
}
}
}
} while (found_some);
*relids_p = relids;
return outerjoin_delayed;
}
/*
* distribute_restrictinfo_to_rels
* Push a completed RestrictInfo into the proper restriction or join
* clause list(s).
*
* This is the last step of distribute_qual_to_rels() for ordinary qual
* clauses. Clauses that are interesting for equivalence-class processing
* are diverted to the EC machinery, but may ultimately get fed back here.
*/
void
distribute_restrictinfo_to_rels(PlannerInfo *root,
RestrictInfo *restrictinfo)
{
Relids relids = restrictinfo->required_relids;
RelOptInfo *rel;
switch (bms_membership(relids))
{
case BMS_SINGLETON:
/*
* There is only one relation participating in the clause, so
* it is a restriction clause for that relation.
*/
rel = find_base_rel(root, bms_singleton_member(relids));
/* Add clause to rel's restriction list */
rel->baserestrictinfo = lappend(rel->baserestrictinfo,
restrictinfo);
break;
case BMS_MULTIPLE:
/*
* The clause is a join clause, since there is more than one rel
* in its relid set.
*/
/*
* Check for hashjoinable operators. (We don't bother setting
* the hashjoin info if we're not going to need it.)
*/
if (enable_hashjoin)
check_hashjoinable(restrictinfo);
/*
* Add clause to the join lists of all the relevant relations.
*/
add_join_clause_to_rels(root, restrictinfo, relids);
break;
default:
/*
* clause references no rels, and therefore we have no place to
* attach it. Shouldn't get here if callers are working properly.
*/
elog(ERROR, "cannot cope with variable-free clause");
break;
}
}
/*
* process_implied_equality
* Check to see whether we already have a restrictinfo item that says
* item1 = item2, and create one if not; or if delete_it is true,
* remove any such restrictinfo item.
* Create a restrictinfo item that says "item1 op item2", and push it
* into the appropriate lists. (In practice opno is always a btree
* equality operator.)
*
* This processing is a consequence of transitivity of mergejoin equality:
* if we have mergejoinable clauses A = B and B = C, we can deduce A = C
* (where = is an appropriate mergejoinable operator). See path/pathkeys.c
* for more details.
* "qualscope" is the nominal syntactic level to impute to the restrictinfo.
* This must contain at least all the rels used in the expressions, but it
* is used only to set the qual application level when both exprs are
* variable-free. Otherwise the qual is applied at the lowest join level
* that provides all its variables.
*
* "both_const" indicates whether both items are known pseudo-constant;
* in this case it is worth applying eval_const_expressions() in case we
* can produce constant TRUE or constant FALSE. (Otherwise it's not,
* because the expressions went through eval_const_expressions already.)
*
* This is currently used only when an EquivalenceClass is found to
* contain pseudoconstants. See path/pathkeys.c for more details.
*/
void
process_implied_equality(PlannerInfo *root,
Node *item1, Node *item2,
Oid sortop1, Oid sortop2,
Relids item1_relids, Relids item2_relids,
bool delete_it)
Oid opno,
Expr *item1,
Expr *item2,
Relids qualscope,
bool below_outer_join,
bool both_const)
{
Relids relids;
BMS_Membership membership;
RelOptInfo *rel1;
List *restrictlist;
ListCell *itm;
Oid ltype,
rtype;
Operator eq_operator;
Form_pg_operator pgopform;
Expr *clause;
/* Get set of relids referenced in the two expressions */
relids = bms_union(item1_relids, item2_relids);
membership = bms_membership(relids);
/*
* generate_implied_equalities() shouldn't call me on two constants.
* Build the new clause. Copy to ensure it shares no substructure with
* original (this is necessary in case there are subselects in there...)
*/
Assert(membership != BMS_EMPTY_SET);
/*
* If the exprs involve a single rel, we need to look at that rel's
* baserestrictinfo list. If multiple rels, we can scan the joininfo list
* of any of 'em.
*/
if (membership == BMS_SINGLETON)
{
rel1 = find_base_rel(root, bms_singleton_member(relids));
restrictlist = rel1->baserestrictinfo;
}
else
{
Relids other_rels;
int first_rel;
/* Copy relids, find and remove one member */
other_rels = bms_copy(relids);
first_rel = bms_first_member(other_rels);
bms_free(other_rels);
rel1 = find_base_rel(root, first_rel);
restrictlist = rel1->joininfo;
}
/*
* Scan to see if equality is already known. If so, we're done in the add
* case, and done after removing it in the delete case.
*/
foreach(itm, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(itm);
Node *left,
*right;
if (restrictinfo->mergejoinoperator == InvalidOid)
continue; /* ignore non-mergejoinable clauses */
/* We now know the restrictinfo clause is a binary opclause */
left = get_leftop(restrictinfo->clause);
right = get_rightop(restrictinfo->clause);
if ((equal(item1, left) && equal(item2, right)) ||
(equal(item2, left) && equal(item1, right)))
{
/* found a matching clause */
if (delete_it)
{
if (membership == BMS_SINGLETON)
{
/* delete it from local restrictinfo list */
rel1->baserestrictinfo = list_delete_ptr(rel1->baserestrictinfo,
restrictinfo);
}
else
{
/* let joininfo.c do it */
remove_join_clause_from_rels(root, restrictinfo, relids);
}
}
return; /* done */
}
}
/* Didn't find it. Done if deletion requested */
if (delete_it)
return;
/*
* This equality is new information, so construct a clause representing it
* to add to the query data structures.
*/
ltype = exprType(item1);
rtype = exprType(item2);
eq_operator = compatible_oper(NULL, list_make1(makeString("=")),
ltype, rtype,
true, -1);
if (!HeapTupleIsValid(eq_operator))
{
/*
* Would it be safe to just not add the equality to the query if we
* have no suitable equality operator for the combination of
* datatypes? NO, because sortkey selection may screw up anyway.
*/
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("could not identify an equality operator for types %s and %s",
format_type_be(ltype), format_type_be(rtype))));
}
pgopform = (Form_pg_operator) GETSTRUCT(eq_operator);
/*
* Let's just make sure this appears to be a compatible operator.
*
* XXX needs work
*/
if (pgopform->oprresult != BOOLOID)
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("equality operator for types %s and %s should be merge-joinable, but isn't",
format_type_be(ltype), format_type_be(rtype))));
/*
* Now we can build the new clause. Copy to ensure it shares no
* substructure with original (this is necessary in case there are
* subselects in there...)
*/
clause = make_opclause(oprid(eq_operator), /* opno */
clause = make_opclause(opno,
BOOLOID, /* opresulttype */
false, /* opretset */
(Expr *) copyObject(item1),
(Expr *) copyObject(item2));
ReleaseSysCache(eq_operator);
/* If both constant, try to reduce to a boolean constant. */
if (both_const)
{
clause = (Expr *) eval_const_expressions((Node *) clause);
/* If we produced const TRUE, just drop the clause */
if (clause && IsA(clause, Const))
{
Const *cclause = (Const *) clause;
Assert(cclause->consttype == BOOLOID);
if (!cclause->constisnull && DatumGetBool(cclause->constvalue))
return;
}
}
/* Make a copy of qualscope to avoid problems if source EC changes */
qualscope = bms_copy(qualscope);
/*
* Push the new clause into all the appropriate restrictinfo lists.
@ -1159,119 +1099,53 @@ process_implied_equality(PlannerInfo *root,
* taken for an original JOIN/ON clause.
*/
distribute_qual_to_rels(root, (Node *) clause,
true, true, false, relids, NULL, NULL);
true, true, below_outer_join,
qualscope, NULL, NULL);
}
/*
* qual_is_redundant
* Detect whether an implied-equality qual that turns out to be a
* restriction clause for a single base relation is redundant with
* already-known restriction clauses for that rel. This occurs with,
* for example,
* SELECT * FROM tab WHERE f1 = f2 AND f2 = f3;
* We need to suppress the redundant condition to avoid computing
* too-small selectivity, not to mention wasting time at execution.
* build_implied_join_equality --- build a RestrictInfo for a derived equality
*
* Note: quals of the form "var = const" are never considered redundant,
* only those of the form "var = var". This is needed because when we
* have constants in an implied-equality set, we use a different strategy
* that suppresses all "var = var" deductions. We must therefore keep
* all the "var = const" quals.
* This overlaps the functionality of process_implied_equality(), but we
* must return the RestrictInfo, not push it into the joininfo tree.
*/
static bool
qual_is_redundant(PlannerInfo *root,
RestrictInfo *restrictinfo,
List *restrictlist)
RestrictInfo *
build_implied_join_equality(Oid opno,
Expr *item1,
Expr *item2,
Relids qualscope)
{
Node *newleft;
Node *newright;
List *oldquals;
ListCell *olditem;
List *equalexprs;
bool someadded;
/* Never redundant unless vars appear on both sides */
if (bms_is_empty(restrictinfo->left_relids) ||
bms_is_empty(restrictinfo->right_relids))
return false;
newleft = get_leftop(restrictinfo->clause);
newright = get_rightop(restrictinfo->clause);
RestrictInfo *restrictinfo;
Expr *clause;
/*
* Set cached pathkeys. NB: it is okay to do this now because this
* routine is only invoked while we are generating implied equalities.
* Therefore, the equi_key_list is already complete and so we can
* correctly determine canonical pathkeys.
* Build the new clause. Copy to ensure it shares no substructure with
* original (this is necessary in case there are subselects in there...)
*/
cache_mergeclause_pathkeys(root, restrictinfo);
/* If different, say "not redundant" (should never happen) */
if (restrictinfo->left_pathkey != restrictinfo->right_pathkey)
return false;
clause = make_opclause(opno,
BOOLOID, /* opresulttype */
false, /* opretset */
(Expr *) copyObject(item1),
(Expr *) copyObject(item2));
/* Make a copy of qualscope to avoid problems if source EC changes */
qualscope = bms_copy(qualscope);
/*
* Scan existing quals to find those referencing same pathkeys. Usually
* there will be few, if any, so build a list of just the interesting
* ones.
* Build the RestrictInfo node itself.
*/
oldquals = NIL;
foreach(olditem, restrictlist)
{
RestrictInfo *oldrinfo = (RestrictInfo *) lfirst(olditem);
restrictinfo = make_restrictinfo(clause,
true, /* is_pushed_down */
false, /* outerjoin_delayed */
false, /* pseudoconstant */
qualscope);
if (oldrinfo->mergejoinoperator != InvalidOid)
{
cache_mergeclause_pathkeys(root, oldrinfo);
if (restrictinfo->left_pathkey == oldrinfo->left_pathkey &&
restrictinfo->right_pathkey == oldrinfo->right_pathkey)
oldquals = lcons(oldrinfo, oldquals);
}
}
if (oldquals == NIL)
return false;
/* Set mergejoinability info always, and hashjoinability if enabled */
check_mergejoinable(restrictinfo);
if (enable_hashjoin)
check_hashjoinable(restrictinfo);
/*
* Now, we want to develop a list of exprs that are known equal to the
* left side of the new qual. We traverse the old-quals list repeatedly
* to transitively expand the exprs list. If at any point we find we can
* reach the right-side expr of the new qual, we are done. We give up
* when we can't expand the equalexprs list any more.
*/
equalexprs = list_make1(newleft);
do
{
someadded = false;
/* cannot use foreach here because of possible list_delete */
olditem = list_head(oldquals);
while (olditem)
{
RestrictInfo *oldrinfo = (RestrictInfo *) lfirst(olditem);
Node *oldleft = get_leftop(oldrinfo->clause);
Node *oldright = get_rightop(oldrinfo->clause);
Node *newguy = NULL;
/* must advance olditem before list_delete possibly pfree's it */
olditem = lnext(olditem);
if (list_member(equalexprs, oldleft))
newguy = oldright;
else if (list_member(equalexprs, oldright))
newguy = oldleft;
else
continue;
if (equal(newguy, newright))
return true; /* we proved new clause is redundant */
equalexprs = lcons(newguy, equalexprs);
someadded = true;
/*
* Remove this qual from list, since we don't need it anymore.
*/
oldquals = list_delete_ptr(oldquals, oldrinfo);
}
} while (someadded);
return false; /* it's not redundant */
return restrictinfo;
}
@ -1294,10 +1168,7 @@ static void
check_mergejoinable(RestrictInfo *restrictinfo)
{
Expr *clause = restrictinfo->clause;
Oid opno,
leftOp,
rightOp;
Oid opfamily;
Oid opno;
if (restrictinfo->pseudoconstant)
return;
@ -1310,16 +1181,13 @@ check_mergejoinable(RestrictInfo *restrictinfo)
if (op_mergejoinable(opno) &&
!contain_volatile_functions((Node *) clause))
{
/* XXX for the moment, continue to force use of particular sortops */
if (get_op_mergejoin_info(opno, &leftOp, &rightOp, &opfamily))
{
restrictinfo->mergejoinoperator = opno;
restrictinfo->left_sortop = leftOp;
restrictinfo->right_sortop = rightOp;
restrictinfo->mergeopfamily = opfamily;
}
}
restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno);
/*
* Note: op_mergejoinable is just a hint; if we fail to find the
* operator in any btree opfamilies, mergeopfamilies remains NIL
* and so the clause is not treated as mergejoinable.
*/
}
/*

View File

@ -14,7 +14,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/plan/planmain.c,v 1.98 2007/01/05 22:19:32 momjian Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/plan/planmain.c,v 1.99 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -110,14 +110,14 @@ query_planner(PlannerInfo *root, List *tlist, double tuple_fraction,
* for "simple" rels.
*
* NOTE: in_info_list and append_rel_list were set up by subquery_planner,
* do not touch here
* do not touch here; eq_classes may contain data already, too.
*/
root->simple_rel_array_size = list_length(parse->rtable) + 1;
root->simple_rel_array = (RelOptInfo **)
palloc0(root->simple_rel_array_size * sizeof(RelOptInfo *));
root->join_rel_list = NIL;
root->join_rel_hash = NULL;
root->equi_key_list = NIL;
root->canon_pathkeys = NIL;
root->left_join_clauses = NIL;
root->right_join_clauses = NIL;
root->full_join_clauses = NIL;
@ -165,8 +165,8 @@ query_planner(PlannerInfo *root, List *tlist, double tuple_fraction,
* Examine the targetlist and qualifications, adding entries to baserel
* targetlists for all referenced Vars. Restrict and join clauses are
* added to appropriate lists belonging to the mentioned relations. We
* also build lists of equijoined keys for pathkey construction, and form
* a target joinlist for make_one_rel() to work from.
* also build EquivalenceClasses for provably equivalent expressions,
* and form a target joinlist for make_one_rel() to work from.
*
* Note: all subplan nodes will have "flat" (var-only) tlists. This
* implies that all expression evaluations are done at the root of the
@ -179,16 +179,23 @@ query_planner(PlannerInfo *root, List *tlist, double tuple_fraction,
joinlist = deconstruct_jointree(root);
/*
* Use the completed lists of equijoined keys to deduce any implied but
* unstated equalities (for example, A=B and B=C imply A=C).
* Reconsider any postponed outer-join quals now that we have built up
* equivalence classes. (This could result in further additions or
* mergings of classes.)
*/
generate_implied_equalities(root);
reconsider_outer_join_clauses(root);
/*
* We should now have all the pathkey equivalence sets built, so it's now
* possible to convert the requested query_pathkeys to canonical form.
* Also canonicalize the groupClause and sortClause pathkeys for use
* later.
* If we formed any equivalence classes, generate additional restriction
* clauses as appropriate. (Implied join clauses are formed on-the-fly
* later.)
*/
generate_base_implied_equalities(root);
/*
* We have completed merging equivalence sets, so it's now possible to
* convert the requested query_pathkeys to canonical form. Also
* canonicalize the groupClause and sortClause pathkeys for use later.
*/
root->query_pathkeys = canonicalize_pathkeys(root, root->query_pathkeys);
root->group_pathkeys = canonicalize_pathkeys(root, root->group_pathkeys);

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/plan/planner.c,v 1.211 2007/01/10 18:06:03 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/plan/planner.c,v 1.212 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -206,6 +206,8 @@ subquery_planner(Query *parse, double tuple_fraction,
/* Create a PlannerInfo data structure for this subquery */
root = makeNode(PlannerInfo);
root->parse = parse;
root->planner_cxt = CurrentMemoryContext;
root->eq_classes = NIL;
root->in_info_list = NIL;
root->append_rel_list = NIL;
@ -715,9 +717,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
* operation's result. We have to do this before overwriting the sort
* key information...
*/
current_pathkeys = make_pathkeys_for_sortclauses(set_sortclauses,
result_plan->targetlist);
current_pathkeys = canonicalize_pathkeys(root, current_pathkeys);
current_pathkeys = make_pathkeys_for_sortclauses(root,
set_sortclauses,
result_plan->targetlist,
true);
/*
* We should not need to call preprocess_targetlist, since we must be
@ -742,9 +745,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
/*
* Calculate pathkeys that represent result ordering requirements
*/
sort_pathkeys = make_pathkeys_for_sortclauses(parse->sortClause,
tlist);
sort_pathkeys = canonicalize_pathkeys(root, sort_pathkeys);
sort_pathkeys = make_pathkeys_for_sortclauses(root,
parse->sortClause,
tlist,
true);
}
else
{
@ -778,12 +782,18 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
/*
* Calculate pathkeys that represent grouping/ordering requirements.
* Stash them in PlannerInfo so that query_planner can canonicalize
* them.
* them after EquivalenceClasses have been formed.
*/
root->group_pathkeys =
make_pathkeys_for_sortclauses(parse->groupClause, tlist);
make_pathkeys_for_sortclauses(root,
parse->groupClause,
tlist,
false);
root->sort_pathkeys =
make_pathkeys_for_sortclauses(parse->sortClause, tlist);
make_pathkeys_for_sortclauses(root,
parse->sortClause,
tlist,
false);
/*
* Will need actual number of aggregates for estimating costs.
@ -1069,10 +1079,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
{
if (!pathkeys_contained_in(sort_pathkeys, current_pathkeys))
{
result_plan = (Plan *)
make_sort_from_sortclauses(root,
parse->sortClause,
result_plan);
result_plan = (Plan *) make_sort_from_pathkeys(root,
result_plan,
sort_pathkeys);
current_pathkeys = sort_pathkeys;
}
}

View File

@ -15,7 +15,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/prep/prepjointree.c,v 1.45 2007/01/05 22:19:32 momjian Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/prep/prepjointree.c,v 1.46 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -292,6 +292,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
*/
subroot = makeNode(PlannerInfo);
subroot->parse = subquery;
subroot->planner_cxt = CurrentMemoryContext;
subroot->in_info_list = NIL;
subroot->append_rel_list = NIL;

View File

@ -22,7 +22,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/prep/prepunion.c,v 1.135 2007/01/05 22:19:32 momjian Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/prep/prepunion.c,v 1.136 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -1195,10 +1195,8 @@ adjust_appendrel_attrs_mutator(Node *node, AppendRelInfo *context)
*/
newinfo->eval_cost.startup = -1;
newinfo->this_selec = -1;
newinfo->left_pathkey = NIL;
newinfo->right_pathkey = NIL;
newinfo->left_mergescansel = -1;
newinfo->right_mergescansel = -1;
newinfo->left_ec = NULL;
newinfo->right_ec = NULL;
newinfo->left_bucketsize = -1;
newinfo->right_bucketsize = -1;

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/util/joininfo.c,v 1.46 2007/01/05 22:19:32 momjian Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/util/joininfo.c,v 1.47 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -16,6 +16,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
/*
@ -54,6 +55,13 @@ have_relevant_joinclause(PlannerInfo *root,
}
}
/*
* We also need to check the EquivalenceClass data structure, which
* might contain relationships not emitted into the joininfo lists.
*/
if (!result && rel1->has_eclass_joins && rel2->has_eclass_joins)
result = have_relevant_eclass_joinclause(root, rel1, rel2);
/*
* It's possible that the rels correspond to the left and right sides
* of a degenerate outer join, that is, one with no joinclause mentioning
@ -124,37 +132,3 @@ add_join_clause_to_rels(PlannerInfo *root,
}
bms_free(tmprelids);
}
/*
* remove_join_clause_from_rels
* Delete 'restrictinfo' from all the joininfo lists it is in
*
* This reverses the effect of add_join_clause_to_rels. It's used when we
* discover that a join clause is redundant.
*
* 'restrictinfo' describes the join clause
* 'join_relids' is the list of relations participating in the join clause
* (there must be more than one)
*/
void
remove_join_clause_from_rels(PlannerInfo *root,
RestrictInfo *restrictinfo,
Relids join_relids)
{
Relids tmprelids;
int cur_relid;
tmprelids = bms_copy(join_relids);
while ((cur_relid = bms_first_member(tmprelids)) >= 0)
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
*/
Assert(list_member_ptr(rel->joininfo, restrictinfo));
rel->joininfo = list_delete_ptr(rel->joininfo, restrictinfo);
}
bms_free(tmprelids);
}

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/util/pathnode.c,v 1.136 2007/01/10 18:06:04 tgl Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/util/pathnode.c,v 1.137 2007/01/20 20:45:39 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -26,7 +26,6 @@
#include "parser/parse_expr.h"
#include "parser/parse_oper.h"
#include "parser/parsetree.h"
#include "utils/memutils.h"
#include "utils/selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
@ -747,11 +746,11 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath)
return (UniquePath *) rel->cheapest_unique_path;
/*
* We must ensure path struct is allocated in same context as parent rel;
* We must ensure path struct is allocated in main planning context;
* otherwise GEQO memory management causes trouble. (Compare
* best_inner_indexscan().)
*/
oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
oldcontext = MemoryContextSwitchTo(root->planner_cxt);
pathnode = makeNode(UniquePath);
@ -1198,11 +1197,6 @@ create_nestloop_path(PlannerInfo *root,
* 'pathkeys' are the path keys of the new join path
* 'mergeclauses' are the RestrictInfo nodes to use as merge clauses
* (this should be a subset of the restrict_clauses list)
* 'mergefamilies' are the btree opfamily OIDs identifying the merge
* ordering for each merge clause
* 'mergestrategies' are the btree operator strategies identifying the merge
* ordering for each merge clause
* 'mergenullsfirst' are the nulls first/last flags for each merge clause
* 'outersortkeys' are the sort varkeys for the outer relation
* 'innersortkeys' are the sort varkeys for the inner relation
*/
@ -1215,9 +1209,6 @@ create_mergejoin_path(PlannerInfo *root,
List *restrict_clauses,
List *pathkeys,
List *mergeclauses,
Oid *mergefamilies,
int *mergestrategies,
bool *mergenullsfirst,
List *outersortkeys,
List *innersortkeys)
{
@ -1258,9 +1249,6 @@ create_mergejoin_path(PlannerInfo *root,
pathnode->jpath.joinrestrictinfo = restrict_clauses;
pathnode->jpath.path.pathkeys = pathkeys;
pathnode->path_mergeclauses = mergeclauses;
pathnode->path_mergeFamilies = mergefamilies;
pathnode->path_mergeStrategies = mergestrategies;
pathnode->path_mergeNullsFirst = mergenullsfirst;
pathnode->outersortkeys = outersortkeys;
pathnode->innersortkeys = innersortkeys;

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/util/relnode.c,v 1.84 2007/01/05 22:19:33 momjian Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/util/relnode.c,v 1.85 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -16,6 +16,7 @@
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
@ -31,17 +32,18 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
static List *build_joinrel_restrictlist(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel,
JoinType jointype);
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
List *joininfo_list);
static void subbuild_joinrel_joinlist(RelOptInfo *joinrel,
List *joininfo_list);
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
List *joininfo_list,
List *new_joininfo);
/*
@ -84,6 +86,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->index_outer_relids = NULL;
rel->index_inner_paths = NIL;
@ -303,8 +306,7 @@ build_join_rel(PlannerInfo *root,
*restrictlist_ptr = build_joinrel_restrictlist(root,
joinrel,
outer_rel,
inner_rel,
jointype);
inner_rel);
return joinrel;
}
@ -335,6 +337,7 @@ build_join_rel(PlannerInfo *root,
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->index_outer_relids = NULL;
joinrel->index_inner_paths = NIL;
@ -354,15 +357,18 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
restrictlist = build_joinrel_restrictlist(root,
joinrel,
outer_rel,
inner_rel,
jointype);
restrictlist = build_joinrel_restrictlist(root, joinrel,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
build_joinrel_joinlist(joinrel, outer_rel, inner_rel);
/*
* This is also the right place to check whether the joinrel has any
* pending EquivalenceClass joins.
*/
joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
/*
* Set estimates of the joinrel's size.
*/
@ -468,15 +474,15 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* join paths made from this pair of sub-relations. (It will not need to
* be considered further up the join tree.)
*
* When building a restriction list, we eliminate redundant clauses.
* We don't try to do that for join clause lists, since the join clauses
* aren't really doing anything, just waiting to become part of higher
* levels' restriction lists.
* In many case we will find the same RestrictInfos in both input
* relations' joinlists, so be careful to eliminate duplicates.
* Pointer equality should be a sufficient test for dups, since all
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
* 'joinrel' is a join relation node
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
* 'jointype' is the type of join used.
*
* build_joinrel_restrictlist() returns a list of relevant restrictinfos,
* whereas build_joinrel_joinlist() stores its results in the joinrel's
@ -491,33 +497,27 @@ static List *
build_joinrel_restrictlist(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel,
JoinType jointype)
RelOptInfo *inner_rel)
{
List *result;
List *rlist;
/*
* Collect all the clauses that syntactically belong at this level.
* Collect all the clauses that syntactically belong at this level,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
rlist = list_concat(subbuild_joinrel_restrictlist(joinrel,
outer_rel->joininfo),
subbuild_joinrel_restrictlist(joinrel,
inner_rel->joininfo));
result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
/*
* Eliminate duplicate and redundant clauses.
*
* We must eliminate duplicates, since we will see many of the same
* clauses arriving from both input relations. Also, if a clause is a
* mergejoinable clause, it's possible that it is redundant with previous
* clauses (see optimizer/README for discussion). We detect that case and
* omit the redundant clause from the result list.
* Add on any clauses derived from EquivalenceClasses. These cannot be
* redundant with the clauses in the joininfo lists, so don't bother
* checking.
*/
result = remove_redundant_join_clauses(root, rlist,
IS_OUTER_JOIN(jointype));
list_free(rlist);
result = list_concat(result,
generate_join_implied_equalities(root,
joinrel,
outer_rel,
inner_rel));
return result;
}
@ -527,15 +527,24 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
subbuild_joinrel_joinlist(joinrel, outer_rel->joininfo);
subbuild_joinrel_joinlist(joinrel, inner_rel->joininfo);
List *result;
/*
* Collect all the clauses that syntactically belong above this level,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
result = subbuild_joinrel_joinlist(joinrel, outer_rel->joininfo, NIL);
result = subbuild_joinrel_joinlist(joinrel, inner_rel->joininfo, result);
joinrel->joininfo = result;
}
static List *
subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
List *joininfo_list)
List *joininfo_list,
List *new_restrictlist)
{
List *restrictlist = NIL;
ListCell *l;
foreach(l, joininfo_list)
@ -546,10 +555,12 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
/*
* This clause becomes a restriction clause for the joinrel, since
* it refers to no outside rels. We don't bother to check for
* duplicates here --- build_joinrel_restrictlist will do that.
* it refers to no outside rels. Add it to the list, being
* careful to eliminate duplicates. (Since RestrictInfo nodes in
* different joinlists will have been multiply-linked rather than
* copied, pointer equality should be a sufficient test.)
*/
restrictlist = lappend(restrictlist, rinfo);
new_restrictlist = list_append_unique_ptr(new_restrictlist, rinfo);
}
else
{
@ -560,12 +571,13 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
}
}
return restrictlist;
return new_restrictlist;
}
static void
static List *
subbuild_joinrel_joinlist(RelOptInfo *joinrel,
List *joininfo_list)
List *joininfo_list,
List *new_joininfo)
{
ListCell *l;
@ -585,15 +597,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
{
/*
* This clause is still a join clause at this level, so add it to
* the joininfo list for the joinrel, being careful to eliminate
* duplicates. (Since RestrictInfo nodes are normally
* multiply-linked rather than copied, pointer equality should be
* a sufficient test. If two equal() nodes should happen to sneak
* in, no great harm is done --- they'll be detected by
* redundant-clause testing when they reach a restriction list.)
* the new joininfo list, being careful to eliminate
* duplicates. (Since RestrictInfo nodes in different joinlists
* will have been multiply-linked rather than copied, pointer
* equality should be a sufficient test.)
*/
joinrel->joininfo = list_append_unique_ptr(joinrel->joininfo,
rinfo);
new_joininfo = list_append_unique_ptr(new_joininfo, rinfo);
}
}
return new_joininfo;
}

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/optimizer/util/restrictinfo.c,v 1.51 2007/01/05 22:19:33 momjian Exp $
* $PostgreSQL: pgsql/src/backend/optimizer/util/restrictinfo.c,v 1.52 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -33,10 +33,9 @@ static Expr *make_sub_restrictinfos(Expr *clause,
bool outerjoin_delayed,
bool pseudoconstant,
Relids required_relids);
static RestrictInfo *join_clause_is_redundant(PlannerInfo *root,
static bool join_clause_is_redundant(PlannerInfo *root,
RestrictInfo *rinfo,
List *reference_list,
bool isouterjoin);
List *reference_list);
/*
@ -336,19 +335,17 @@ make_restrictinfo_internal(Expr *clause,
* that happens only if it appears in the right context (top level of a
* joinclause list).
*/
restrictinfo->parent_ec = NULL;
restrictinfo->eval_cost.startup = -1;
restrictinfo->this_selec = -1;
restrictinfo->mergejoinoperator = InvalidOid;
restrictinfo->left_sortop = InvalidOid;
restrictinfo->right_sortop = InvalidOid;
restrictinfo->mergeopfamily = InvalidOid;
restrictinfo->mergeopfamilies = NIL;
restrictinfo->left_pathkey = NIL;
restrictinfo->right_pathkey = NIL;
restrictinfo->left_ec = NULL;
restrictinfo->right_ec = NULL;
restrictinfo->left_mergescansel = -1;
restrictinfo->right_mergescansel = -1;
restrictinfo->outer_is_left = false;
restrictinfo->hashjoinoperator = InvalidOid;
@ -529,78 +526,18 @@ extract_actual_join_clauses(List *restrictinfo_list,
}
}
/*
* remove_redundant_join_clauses
*
* Given a list of RestrictInfo clauses that are to be applied in a join,
* remove any duplicate or redundant clauses.
*
* We must eliminate duplicates when forming the restrictlist for a joinrel,
* since we will see many of the same clauses arriving from both input
* relations. Also, if a clause is a mergejoinable clause, it's possible that
* it is redundant with previous clauses (see optimizer/README for
* discussion). We detect that case and omit the redundant clause from the
* result list.
*
* The result is a fresh List, but it points to the same member nodes
* as were in the input.
*/
List *
remove_redundant_join_clauses(PlannerInfo *root, List *restrictinfo_list,
bool isouterjoin)
{
List *result = NIL;
ListCell *item;
QualCost cost;
/*
* If there are any redundant clauses, we want to eliminate the ones that
* are more expensive in favor of the ones that are less so. Run
* cost_qual_eval() to ensure the eval_cost fields are set up.
*/
cost_qual_eval(&cost, restrictinfo_list);
/*
* We don't have enough knowledge yet to be able to estimate the number of
* times a clause might be evaluated, so it's hard to weight the startup
* and per-tuple costs appropriately. For now just weight 'em the same.
*/
#define CLAUSECOST(r) ((r)->eval_cost.startup + (r)->eval_cost.per_tuple)
foreach(item, restrictinfo_list)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(item);
RestrictInfo *prevrinfo;
/* is it redundant with any prior clause? */
prevrinfo = join_clause_is_redundant(root, rinfo, result, isouterjoin);
if (prevrinfo == NULL)
{
/* no, so add it to result list */
result = lappend(result, rinfo);
}
else if (CLAUSECOST(rinfo) < CLAUSECOST(prevrinfo))
{
/* keep this one, drop the previous one */
result = list_delete_ptr(result, prevrinfo);
result = lappend(result, rinfo);
}
/* else, drop this one */
}
return result;
}
/*
* select_nonredundant_join_clauses
*
* Given a list of RestrictInfo clauses that are to be applied in a join,
* select the ones that are not redundant with any clause in the
* reference_list.
* reference_list. This is used only for nestloop-with-inner-indexscan
* joins: any clauses being checked by the index should be removed from
* the qpquals list.
*
* This is similar to remove_redundant_join_clauses, but we are looking for
* redundancies with a separate list of clauses (i.e., clauses that have
* already been applied below the join itself).
* "Redundant" means either equal() or derived from the same EquivalenceClass.
* We have to check the latter because indxqual.c may select different derived
* clauses than were selected by generate_join_implied_equalities().
*
* Note that we assume the given restrictinfo_list has already been checked
* for local redundancies, so we don't check again.
@ -608,8 +545,7 @@ remove_redundant_join_clauses(PlannerInfo *root, List *restrictinfo_list,
List *
select_nonredundant_join_clauses(PlannerInfo *root,
List *restrictinfo_list,
List *reference_list,
bool isouterjoin)
List *reference_list)
{
List *result = NIL;
ListCell *item;
@ -619,7 +555,7 @@ select_nonredundant_join_clauses(PlannerInfo *root,
RestrictInfo *rinfo = (RestrictInfo *) lfirst(item);
/* drop it if redundant with any reference clause */
if (join_clause_is_redundant(root, rinfo, reference_list, isouterjoin) != NULL)
if (join_clause_is_redundant(root, rinfo, reference_list))
continue;
/* otherwise, add it to result list */
@ -631,79 +567,28 @@ select_nonredundant_join_clauses(PlannerInfo *root,
/*
* join_clause_is_redundant
* If rinfo is redundant with any clause in reference_list,
* return one such clause; otherwise return NULL.
*
* This is the guts of both remove_redundant_join_clauses and
* select_nonredundant_join_clauses. See the docs above for motivation.
*
* We can detect redundant mergejoinable clauses very cheaply by using their
* left and right pathkeys, which uniquely identify the sets of equijoined
* variables in question. All the members of a pathkey set that are in the
* left relation have already been forced to be equal; likewise for those in
* the right relation. So, we need to have only one clause that checks
* equality between any set member on the left and any member on the right;
* by transitivity, all the rest are then equal.
*
* However, clauses that are of the form "var expr = const expr" cannot be
* eliminated as redundant. This is because when there are const expressions
* in a pathkey set, generate_implied_equalities() suppresses "var = var"
* clauses in favor of "var = const" clauses. We cannot afford to drop any
* of the latter, even though they might seem redundant by the pathkey
* membership test.
*
* Weird special case: if we have two clauses that seem redundant
* except one is pushed down into an outer join and the other isn't,
* then they're not really redundant, because one constrains the
* joined rows after addition of null fill rows, and the other doesn't.
* Test whether rinfo is redundant with any clause in reference_list.
*/
static RestrictInfo *
static bool
join_clause_is_redundant(PlannerInfo *root,
RestrictInfo *rinfo,
List *reference_list,
bool isouterjoin)
List *reference_list)
{
ListCell *refitem;
/* always consider exact duplicates redundant */
foreach(refitem, reference_list)
{
RestrictInfo *refrinfo = (RestrictInfo *) lfirst(refitem);
/* always consider exact duplicates redundant */
if (equal(rinfo, refrinfo))
return refrinfo;
return true;
/* check if derived from same EquivalenceClass */
if (rinfo->parent_ec != NULL &&
rinfo->parent_ec == refrinfo->parent_ec)
return true;
}
/* check for redundant merge clauses */
if (rinfo->mergejoinoperator != InvalidOid)
{
/* do the cheap test first: is it a "var = const" clause? */
if (bms_is_empty(rinfo->left_relids) ||
bms_is_empty(rinfo->right_relids))
return NULL; /* var = const, so not redundant */
cache_mergeclause_pathkeys(root, rinfo);
foreach(refitem, reference_list)
{
RestrictInfo *refrinfo = (RestrictInfo *) lfirst(refitem);
if (refrinfo->mergejoinoperator != InvalidOid)
{
cache_mergeclause_pathkeys(root, refrinfo);
if (rinfo->left_pathkey == refrinfo->left_pathkey &&
rinfo->right_pathkey == refrinfo->right_pathkey &&
(rinfo->is_pushed_down == refrinfo->is_pushed_down ||
!isouterjoin))
{
/* Yup, it's redundant */
return refrinfo;
}
}
}
}
/* otherwise, not redundant */
return NULL;
return false;
}

View File

@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/parser/parse_agg.c,v 1.75 2007/01/05 22:19:33 momjian Exp $
* $PostgreSQL: pgsql/src/backend/parser/parse_agg.c,v 1.76 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -171,6 +171,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
{
root = makeNode(PlannerInfo);
root->parse = qry;
root->planner_cxt = CurrentMemoryContext;
root->hasJoinRTEs = true;
groupClauses = (List *) flatten_join_alias_vars(root,

View File

@ -15,7 +15,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.219 2007/01/09 02:14:14 tgl Exp $
* $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.220 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -2345,7 +2345,7 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
* expressional index for which we have statistics, then we treat the
* whole expression as though it were just a Var.
* 2. If the list contains Vars of different relations that are known equal
* due to equijoin clauses, then drop all but one of the Vars from each
* due to equivalence classes, then drop all but one of the Vars from each
* known-equal set, keeping the one with smallest estimated # of values
* (since the extra values of the others can't appear in joined rows).
* Note the reason we only consider Vars of different relations is that
@ -2365,10 +2365,9 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
* 4. If there are Vars from multiple rels, we repeat step 3 for each such
* rel, and multiply the results together.
* Note that rels not containing grouped Vars are ignored completely, as are
* join clauses other than the equijoin clauses used in step 2. Such rels
* cannot increase the number of groups, and we assume such clauses do not
* reduce the number either (somewhat bogus, but we don't have the info to
* do better).
* join clauses. Such rels cannot increase the number of groups, and we
* assume such clauses do not reduce the number either (somewhat bogus,
* but we don't have the info to do better).
*/
double
estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows)

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/utils/cache/lsyscache.c,v 1.143 2007/01/10 18:06:04 tgl Exp $
* $PostgreSQL: pgsql/src/backend/utils/cache/lsyscache.c,v 1.144 2007/01/20 20:45:40 tgl Exp $
*
* NOTES
* Eventually, the index information should go through here, too.
@ -138,153 +138,6 @@ get_opfamily_member(Oid opfamily, Oid lefttype, Oid righttype,
return result;
}
/*
* get_op_mergejoin_info
* Given the OIDs of a (putatively) mergejoinable equality operator
* and a sortop defining the sort ordering of the lefthand input of
* the merge clause, determine whether this sort ordering is actually
* usable for merging. If so, return the required sort ordering op
* for the righthand input, as well as the btree opfamily OID containing
* these operators and the operator strategy number of the two sortops
* (either BTLessStrategyNumber or BTGreaterStrategyNumber).
*
* We can mergejoin if we find the two operators in the same opfamily as
* equality and either less-than or greater-than respectively. If there
* are multiple such opfamilies, assume we can use any one.
*/
#ifdef NOT_YET
/* eventually should look like this */
bool
get_op_mergejoin_info(Oid eq_op, Oid left_sortop,
Oid *right_sortop, Oid *opfamily, int *opstrategy)
{
bool result = false;
Oid lefttype;
Oid righttype;
CatCList *catlist;
int i;
/* Make sure output args are initialized even on failure */
*right_sortop = InvalidOid;
*opfamily = InvalidOid;
*opstrategy = 0;
/* Need the righthand input datatype */
op_input_types(eq_op, &lefttype, &righttype);
/*
* Search through all the pg_amop entries containing the equality operator
*/
catlist = SearchSysCacheList(AMOPOPID, 1,
ObjectIdGetDatum(eq_op),
0, 0, 0);
for (i = 0; i < catlist->n_members; i++)
{
HeapTuple op_tuple = &catlist->members[i]->tuple;
Form_pg_amop op_form = (Form_pg_amop) GETSTRUCT(op_tuple);
Oid opfamily_id;
StrategyNumber op_strategy;
/* must be btree */
if (op_form->amopmethod != BTREE_AM_OID)
continue;
/* must use the operator as equality */
if (op_form->amopstrategy != BTEqualStrategyNumber)
continue;
/* See if sort operator is also in this opfamily with OK semantics */
opfamily_id = op_form->amopfamily;
op_strategy = get_op_opfamily_strategy(left_sortop, opfamily_id);
if (op_strategy == BTLessStrategyNumber ||
op_strategy == BTGreaterStrategyNumber)
{
/* Yes, so find the corresponding righthand sortop */
*right_sortop = get_opfamily_member(opfamily_id,
righttype,
righttype,
op_strategy);
if (OidIsValid(*right_sortop))
{
/* Found a workable mergejoin semantics */
*opfamily = opfamily_id;
*opstrategy = op_strategy;
result = true;
break;
}
}
}
ReleaseSysCacheList(catlist);
return result;
}
#else
/* temp implementation until planner gets smarter: left_sortop is output */
bool
get_op_mergejoin_info(Oid eq_op, Oid *left_sortop,
Oid *right_sortop, Oid *opfamily)
{
bool result = false;
Oid lefttype;
Oid righttype;
CatCList *catlist;
int i;
/* Make sure output args are initialized even on failure */
*left_sortop = InvalidOid;
*right_sortop = InvalidOid;
*opfamily = InvalidOid;
/* Need the input datatypes */
op_input_types(eq_op, &lefttype, &righttype);
/*
* Search through all the pg_amop entries containing the equality operator
*/
catlist = SearchSysCacheList(AMOPOPID, 1,
ObjectIdGetDatum(eq_op),
0, 0, 0);
for (i = 0; i < catlist->n_members; i++)
{
HeapTuple op_tuple = &catlist->members[i]->tuple;
Form_pg_amop op_form = (Form_pg_amop) GETSTRUCT(op_tuple);
Oid opfamily_id;
/* must be btree */
if (op_form->amopmethod != BTREE_AM_OID)
continue;
/* must use the operator as equality */
if (op_form->amopstrategy != BTEqualStrategyNumber)
continue;
opfamily_id = op_form->amopfamily;
/* Find the matching sortops */
*left_sortop = get_opfamily_member(opfamily_id,
lefttype,
lefttype,
BTLessStrategyNumber);
*right_sortop = get_opfamily_member(opfamily_id,
righttype,
righttype,
BTLessStrategyNumber);
if (OidIsValid(*left_sortop) && OidIsValid(*right_sortop))
{
/* Found a workable mergejoin semantics */
*opfamily = opfamily_id;
result = true;
break;
}
}
ReleaseSysCacheList(catlist);
return result;
}
#endif
/*
* get_compare_function_for_ordering_op
* Get the OID of the datatype-specific btree comparison function
@ -469,6 +322,56 @@ get_ordering_op_for_equality_op(Oid opno, bool use_lhs_type)
return result;
}
/*
* get_mergejoin_opfamilies
* Given a putatively mergejoinable operator, return a list of the OIDs
* of the btree opfamilies in which it represents equality.
*
* It is possible (though at present unusual) for an operator to be equality
* in more than one opfamily, hence the result is a list. This also lets us
* return NIL if the operator is not found in any opfamilies.
*
* The planner currently uses simple equal() tests to compare the lists
* returned by this function, which makes the list order relevant, though
* strictly speaking it should not be. Because of the way syscache list
* searches are handled, in normal operation the result will be sorted by OID
* so everything works fine. If running with system index usage disabled,
* the result ordering is unspecified and hence the planner might fail to
* recognize optimization opportunities ... but that's hardly a scenario in
* which performance is good anyway, so there's no point in expending code
* or cycles here to guarantee the ordering in that case.
*/
List *
get_mergejoin_opfamilies(Oid opno)
{
List *result = NIL;
CatCList *catlist;
int i;
/*
* Search pg_amop to see if the target operator is registered as the "="
* operator of any btree opfamily.
*/
catlist = SearchSysCacheList(AMOPOPID, 1,
ObjectIdGetDatum(opno),
0, 0, 0);
for (i = 0; i < catlist->n_members; i++)
{
HeapTuple tuple = &catlist->members[i]->tuple;
Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple);
/* must be btree equality */
if (aform->amopmethod == BTREE_AM_OID &&
aform->amopstrategy == BTEqualStrategyNumber)
result = lappend_oid(result, aform->amopfamily);
}
ReleaseSysCacheList(catlist);
return result;
}
/*
* get_compatible_hash_operator
* Get the OID of a hash equality operator compatible with the given

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/nodes/nodes.h,v 1.191 2007/01/05 22:19:55 momjian Exp $
* $PostgreSQL: pgsql/src/include/nodes/nodes.h,v 1.192 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -190,7 +190,9 @@ typedef enum NodeTag
T_ResultPath,
T_MaterialPath,
T_UniquePath,
T_PathKeyItem,
T_EquivalenceClass,
T_EquivalenceMember,
T_PathKey,
T_RestrictInfo,
T_InnerIndexscanInfo,
T_OuterJoinInfo,

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/nodes/relation.h,v 1.132 2007/01/10 18:06:04 tgl Exp $
* $PostgreSQL: pgsql/src/include/nodes/relation.h,v 1.133 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -69,7 +69,7 @@ typedef struct PlannerInfo
* does not correspond to a base relation, such as a join RTE or an
* unreferenced view RTE; or if the RelOptInfo hasn't been made yet.
*/
struct RelOptInfo **simple_rel_array; /* All 1-relation RelOptInfos */
struct RelOptInfo **simple_rel_array; /* All 1-rel RelOptInfos */
int simple_rel_array_size; /* allocated size of array */
/*
@ -84,18 +84,20 @@ typedef struct PlannerInfo
List *join_rel_list; /* list of join-relation RelOptInfos */
struct HTAB *join_rel_hash; /* optional hashtable for join relations */
List *equi_key_list; /* list of lists of equijoined PathKeyItems */
List *eq_classes; /* list of active EquivalenceClasses */
List *left_join_clauses; /* list of RestrictInfos for outer
* join clauses w/nonnullable var on
* left */
List *canon_pathkeys; /* list of "canonical" PathKeys */
List *right_join_clauses; /* list of RestrictInfos for outer
* join clauses w/nonnullable var on
* right */
List *left_join_clauses; /* list of RestrictInfos for
* mergejoinable outer join clauses
* w/nonnullable var on left */
List *full_join_clauses; /* list of RestrictInfos for full
* outer join clauses */
List *right_join_clauses; /* list of RestrictInfos for
* mergejoinable outer join clauses
* w/nonnullable var on right */
List *full_join_clauses; /* list of RestrictInfos for
* mergejoinable full join clauses */
List *oj_info_list; /* list of OuterJoinInfos */
@ -109,6 +111,8 @@ typedef struct PlannerInfo
List *group_pathkeys; /* groupClause pathkeys, if any */
List *sort_pathkeys; /* sortClause pathkeys, if any */
MemoryContext planner_cxt; /* context holding PlannerInfo */
double total_table_pages; /* # of pages in all tables of query */
double tuple_fraction; /* tuple_fraction passed to query_planner */
@ -209,7 +213,10 @@ typedef struct PlannerInfo
* baserestrictcost - Estimated cost of evaluating the baserestrictinfo
* clauses at a single tuple (only used for base rels)
* joininfo - List of RestrictInfo nodes, containing info about each
* join clause in which this relation participates
* join clause in which this relation participates (but
* note this excludes clauses that might be derivable from
* EquivalenceClasses)
* has_eclass_joins - flag that EquivalenceClass joins are possible
* index_outer_relids - only used for base rels; set of outer relids
* that participate in indexable joinclauses for this rel
* index_inner_paths - only used for base rels; list of InnerIndexscanInfo
@ -278,6 +285,7 @@ typedef struct RelOptInfo
QualCost baserestrictcost; /* cost of evaluating the above */
List *joininfo; /* RestrictInfo structures for join clauses
* involving this rel */
bool has_eclass_joins; /* T means joininfo is incomplete */
/* cached info about inner indexscan paths for relation: */
Relids index_outer_relids; /* other relids in indexable join
@ -349,31 +357,106 @@ typedef struct IndexOptInfo
/*
* PathKeys
* EquivalenceClasses
*
* The sort ordering of a path is represented by a list of sublists of
* PathKeyItem nodes. An empty list implies no known ordering. Otherwise
* the first sublist represents the primary sort key, the second the
* first secondary sort key, etc. Each sublist contains one or more
* PathKeyItem nodes, each of which can be taken as the attribute that
* appears at that sort position. (See optimizer/README for more
* information.)
* Whenever we can determine that a mergejoinable equality clause A = B is
* not delayed by any outer join, we create an EquivalenceClass containing
* the expressions A and B to record this knowledge. If we later find another
* equivalence B = C, we add C to the existing EquivalenceClass; this may
* require merging two existing EquivalenceClasses. At the end of the qual
* distribution process, we have sets of values that are known all transitively
* equal to each other, where "equal" is according to the rules of the btree
* operator family(s) shown in ec_opfamilies. (We restrict an EC to contain
* only equalities whose operators belong to the same set of opfamilies. This
* could probably be relaxed, but for now it's not worth the trouble, since
* nearly all equality operators belong to only one btree opclass anyway.)
*
* We also use EquivalenceClasses as the base structure for PathKeys, letting
* us represent knowledge about different sort orderings being equivalent.
* Since every PathKey must reference an EquivalenceClass, we will end up
* with single-member EquivalenceClasses whenever a sort key expression has
* not been equivalenced to anything else. It is also possible that such an
* EquivalenceClass will contain a volatile expression ("ORDER BY random()"),
* which is a case that can't arise otherwise since clauses containing
* volatile functions are never considered mergejoinable. We mark such
* EquivalenceClasses specially to prevent them from being merged with
* ordinary EquivalenceClasses.
*
* We allow equality clauses appearing below the nullable side of an outer join
* to form EquivalenceClasses, but these have a slightly different meaning:
* the included values might be all NULL rather than all the same non-null
* values. See src/backend/optimizer/README for more on that point.
*
* NB: if ec_merged isn't NULL, this class has been merged into another, and
* should be ignored in favor of using the pointed-to class.
*/
typedef struct PathKeyItem
typedef struct EquivalenceClass
{
NodeTag type;
Node *key; /* the item that is ordered */
Oid sortop; /* the ordering operator ('<' op) */
bool nulls_first; /* do NULLs come before normal values? */
List *ec_opfamilies; /* btree operator family OIDs */
List *ec_members; /* list of EquivalenceMembers */
List *ec_sources; /* list of generating RestrictInfos */
Relids ec_relids; /* all relids appearing in ec_members */
bool ec_has_const; /* any pseudoconstants in ec_members? */
bool ec_has_volatile; /* the (sole) member is a volatile expr */
bool ec_below_outer_join; /* equivalence applies below an OJ */
bool ec_broken; /* failed to generate needed clauses? */
struct EquivalenceClass *ec_merged; /* set if merged into another EC */
} EquivalenceClass;
/*
* key typically points to a Var node, ie a relation attribute, but it can
* also point to an arbitrary expression representing the value indexed by
* an index expression.
*/
} PathKeyItem;
/*
* EquivalenceMember - one member expression of an EquivalenceClass
*
* em_is_child signifies that this element was built by transposing a member
* for an inheritance parent relation to represent the corresponding expression
* on an inheritance child. The element should be ignored for all purposes
* except constructing inner-indexscan paths for the child relation. (Other
* types of join are driven from transposed joininfo-list entries.) Note
* that the EC's ec_relids field does NOT include the child relation.
*
* em_datatype is usually the same as exprType(em_expr), but can be
* different when dealing with a binary-compatible opfamily; in particular
* anyarray_ops would never work without this. Use em_datatype when
* looking up a specific btree operator to work with this expression.
*/
typedef struct EquivalenceMember
{
NodeTag type;
Expr *em_expr; /* the expression represented */
Relids em_relids; /* all relids appearing in em_expr */
bool em_is_const; /* expression is pseudoconstant? */
bool em_is_child; /* derived version for a child relation? */
Oid em_datatype; /* the "nominal type" used by the opfamily */
} EquivalenceMember;
/*
* PathKeys
*
* The sort ordering of a path is represented by a list of PathKey nodes.
* An empty list implies no known ordering. Otherwise the first item
* represents the primary sort key, the second the first secondary sort key,
* etc. The value being sorted is represented by linking to an
* EquivalenceClass containing that value and including pk_opfamily among its
* ec_opfamilies. This is a convenient method because it makes it trivial
* to detect equivalent and closely-related orderings. (See optimizer/README
* for more information.)
*
* Note: pk_strategy is either BTLessStrategyNumber (for ASC) or
* BTGreaterStrategyNumber (for DESC). We assume that all ordering-capable
* index types will use btree-compatible strategy numbers.
*/
typedef struct PathKey
{
NodeTag type;
EquivalenceClass *pk_eclass; /* the value that is ordered */
Oid pk_opfamily; /* btree opfamily defining the ordering */
int pk_strategy; /* sort direction (ASC or DESC) */
bool pk_nulls_first; /* do NULLs come before normal values? */
} PathKey;
/*
* Type "Path" is used as-is for sequential-scan paths. For other
@ -398,7 +481,7 @@ typedef struct Path
Cost total_cost; /* total cost (assuming all tuples fetched) */
List *pathkeys; /* sort ordering of path's output */
/* pathkeys is a List of Lists of PathKeyItem nodes; see above */
/* pathkeys is a List of PathKey nodes; see above */
} Path;
/*----------
@ -618,11 +701,7 @@ typedef JoinPath NestPath;
* A mergejoin path has these fields.
*
* path_mergeclauses lists the clauses (in the form of RestrictInfos)
* that will be used in the merge. The parallel arrays path_mergeFamilies,
* path_mergeStrategies, and path_mergeNullsFirst specify the merge semantics
* for each clause (i.e., define the relevant sort ordering for each clause).
* (XXX is this the most reasonable path-time representation? It's at least
* partially redundant with the pathkeys of the input paths.)
* that will be used in the merge.
*
* Note that the mergeclauses are a subset of the parent relation's
* restriction-clause list. Any join clauses that are not mergejoinable
@ -639,10 +718,6 @@ typedef struct MergePath
{
JoinPath jpath;
List *path_mergeclauses; /* join clauses to be used for merge */
/* these are arrays, but have the same length as the mergeclauses list: */
Oid *path_mergeFamilies; /* per-clause OIDs of opfamilies */
int *path_mergeStrategies; /* per-clause ordering (ASC or DESC) */
bool *path_mergeNullsFirst; /* per-clause nulls ordering */
List *outersortkeys; /* keys for explicit sort, if any */
List *innersortkeys; /* keys for explicit sort, if any */
} MergePath;
@ -696,6 +771,15 @@ typedef struct HashPath
* sequence we use. So, these clauses cannot be associated directly with
* the join RelOptInfo, but must be kept track of on a per-join-path basis.
*
* RestrictInfos that represent equivalence conditions (i.e., mergejoinable
* equalities that are not outerjoin-delayed) are handled a bit differently.
* Initially we attach them to the EquivalenceClasses that are derived from
* them. When we construct a scan or join path, we look through all the
* EquivalenceClasses and generate derived RestrictInfos representing the
* minimal set of conditions that need to be checked for this particular scan
* or join to enforce that all members of each EquivalenceClass are in fact
* equal in all rows emitted by the scan or join.
*
* When dealing with outer joins we have to be very careful about pushing qual
* clauses up and down the tree. An outer join's own JOIN/ON conditions must
* be evaluated exactly at that join node, and any quals appearing in WHERE or
@ -728,9 +812,9 @@ typedef struct HashPath
*
* In general, the referenced clause might be arbitrarily complex. The
* kinds of clauses we can handle as indexscan quals, mergejoin clauses,
* or hashjoin clauses are fairly limited --- the code for each kind of
* path is responsible for identifying the restrict clauses it can use
* and ignoring the rest. Clauses not implemented by an indexscan,
* or hashjoin clauses are limited (e.g., no volatile functions). The code
* for each kind of path is responsible for identifying the restrict clauses
* it can use and ignoring the rest. Clauses not implemented by an indexscan,
* mergejoin, or hashjoin will be placed in the plan qual or joinqual field
* of the finished Plan node, where they will be enforced by general-purpose
* qual-expression-evaluation code. (But we are still entitled to count
@ -758,6 +842,12 @@ typedef struct HashPath
* estimates. Note that a pseudoconstant clause can never be an indexqual
* or merge or hash join clause, so it's of no interest to large parts of
* the planner.
*
* When join clauses are generated from EquivalenceClasses, there may be
* several equally valid ways to enforce join equivalence, of which we need
* apply only one. We mark clauses of this kind by setting parent_ec to
* point to the generating EquivalenceClass. Multiple clauses with the same
* parent_ec in the same join are redundant.
*/
typedef struct RestrictInfo
@ -787,23 +877,22 @@ typedef struct RestrictInfo
/* This field is NULL unless clause is an OR clause: */
Expr *orclause; /* modified clause with RestrictInfos */
/* This field is NULL unless clause is potentially redundant: */
EquivalenceClass *parent_ec; /* generating EquivalenceClass */
/* cache space for cost and selectivity */
QualCost eval_cost; /* eval cost of clause; -1 if not yet set */
Selectivity this_selec; /* selectivity; -1 if not yet set */
/* valid if clause is mergejoinable, else InvalidOid: */
Oid mergejoinoperator; /* copy of clause operator */
Oid left_sortop; /* leftside sortop needed for mergejoin */
Oid right_sortop; /* rightside sortop needed for mergejoin */
Oid mergeopfamily; /* btree opfamily relating these ops */
/* valid if clause is mergejoinable, else NIL */
List *mergeopfamilies; /* opfamilies containing clause operator */
/* cache space for mergeclause processing; NIL if not yet set */
List *left_pathkey; /* canonical pathkey for left side */
List *right_pathkey; /* canonical pathkey for right side */
/* cache space for mergeclause processing; NULL if not yet set */
EquivalenceClass *left_ec; /* EquivalenceClass containing lefthand */
EquivalenceClass *right_ec; /* EquivalenceClass containing righthand */
/* cache space for mergeclause processing; -1 if not yet set */
Selectivity left_mergescansel; /* fraction of left side to scan */
Selectivity right_mergescansel; /* fraction of right side to scan */
/* transient workspace for use while considering a specific join path */
bool outer_is_left; /* T = outer var on left, F = on right */
/* valid if clause is hashjoinable, else InvalidOid: */
Oid hashjoinoperator; /* copy of clause operator */

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/optimizer/joininfo.h,v 1.33 2007/01/05 22:19:56 momjian Exp $
* $PostgreSQL: pgsql/src/include/optimizer/joininfo.h,v 1.34 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -23,8 +23,5 @@ extern bool have_relevant_joinclause(PlannerInfo *root,
extern void add_join_clause_to_rels(PlannerInfo *root,
RestrictInfo *restrictinfo,
Relids join_relids);
extern void remove_join_clause_from_rels(PlannerInfo *root,
RestrictInfo *restrictinfo,
Relids join_relids);
#endif /* JOININFO_H */

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/optimizer/pathnode.h,v 1.75 2007/01/10 18:06:04 tgl Exp $
* $PostgreSQL: pgsql/src/include/optimizer/pathnode.h,v 1.76 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -71,9 +71,6 @@ extern MergePath *create_mergejoin_path(PlannerInfo *root,
List *restrict_clauses,
List *pathkeys,
List *mergeclauses,
Oid *mergefamilies,
int *mergestrategies,
bool *mergenullsfirst,
List *outersortkeys,
List *innersortkeys);

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/optimizer/paths.h,v 1.94 2007/01/05 22:19:56 momjian Exp $
* $PostgreSQL: pgsql/src/include/optimizer/paths.h,v 1.95 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -52,6 +52,9 @@ extern List *group_clauses_by_indexkey(IndexOptInfo *index,
Relids outer_relids,
SaOpControl saop_control,
bool *found_clause);
extern bool eclass_matches_any_index(EquivalenceClass *ec,
EquivalenceMember *em,
RelOptInfo *rel);
extern bool match_index_to_operand(Node *operand, int indexcol,
IndexOptInfo *index);
extern List *expand_indexqual_conditions(IndexOptInfo *index,
@ -89,6 +92,37 @@ extern List *make_rels_by_joins(PlannerInfo *root, int level, List **joinrels);
extern RelOptInfo *make_join_rel(PlannerInfo *root,
RelOptInfo *rel1, RelOptInfo *rel2);
/*
* equivclass.c
* routines for managing EquivalenceClasses
*/
extern bool process_equivalence(PlannerInfo *root, RestrictInfo *restrictinfo,
bool below_outer_join);
extern void reconsider_outer_join_clauses(PlannerInfo *root);
extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Expr *expr,
Oid expr_datatype,
List *opfamilies);
extern void generate_base_implied_equalities(PlannerInfo *root);
extern List *generate_join_implied_equalities(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
extern void add_child_rel_equivalences(PlannerInfo *root,
AppendRelInfo *appinfo,
RelOptInfo *parent_rel,
RelOptInfo *child_rel);
extern List *find_eclass_clauses_for_index_join(PlannerInfo *root,
RelOptInfo *rel,
Relids outer_relids);
extern bool have_relevant_eclass_joinclause(PlannerInfo *root,
RelOptInfo *rel1, RelOptInfo *rel2);
extern bool has_relevant_eclass_joinclause(PlannerInfo *root,
RelOptInfo *rel1);
extern bool eclass_useful_for_merging(EquivalenceClass *eclass,
RelOptInfo *rel);
/*
* pathkeys.c
* utilities for matching and building path keys
@ -101,9 +135,6 @@ typedef enum
PATHKEYS_DIFFERENT /* neither pathkey includes the other */
} PathKeysComparison;
extern void add_equijoined_keys(PlannerInfo *root, RestrictInfo *restrictinfo);
extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
extern void generate_implied_equalities(PlannerInfo *root);
extern List *canonicalize_pathkeys(PlannerInfo *root, List *pathkeys);
extern PathKeysComparison compare_pathkeys(List *keys1, List *keys2);
extern bool pathkeys_contained_in(List *keys1, List *keys2);
@ -113,23 +144,29 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
List *pathkeys,
double fraction);
extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
ScanDirection scandir, bool canonical);
ScanDirection scandir);
extern List *convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
List *subquery_pathkeys);
extern List *build_join_pathkeys(PlannerInfo *root,
RelOptInfo *joinrel,
JoinType jointype,
List *outer_pathkeys);
extern List *make_pathkeys_for_sortclauses(List *sortclauses,
List *tlist);
extern void cache_mergeclause_pathkeys(PlannerInfo *root,
extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
List *sortclauses,
List *tlist,
bool canonicalize);
extern void cache_mergeclause_eclasses(PlannerInfo *root,
RestrictInfo *restrictinfo);
extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
List *pathkeys,
bool outer_keys,
List *restrictinfos);
extern List *make_pathkeys_for_mergeclauses(PlannerInfo *root,
List *mergeclauses,
RelOptInfo *rel);
extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
List *mergeclauses,
RelOptInfo *joinrel);
extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
List *mergeclauses,
List *outer_pathkeys);
extern int pathkeys_useful_for_merging(PlannerInfo *root,
RelOptInfo *rel,
List *pathkeys);

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/optimizer/planmain.h,v 1.97 2007/01/10 18:06:04 tgl Exp $
* $PostgreSQL: pgsql/src/include/optimizer/planmain.h,v 1.98 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -38,6 +38,8 @@ extern Plan *create_plan(PlannerInfo *root, Path *best_path);
extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
Index scanrelid, Plan *subplan);
extern Append *make_append(List *appendplans, bool isTarget, List *tlist);
extern Sort *make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree,
List *pathkeys);
extern Sort *make_sort_from_sortclauses(PlannerInfo *root, List *sortcls,
Plan *lefttree);
extern Sort *make_sort_from_groupcols(PlannerInfo *root, List *groupcls,
@ -69,12 +71,22 @@ extern int join_collapse_limit;
extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
extern List *deconstruct_jointree(PlannerInfo *root);
extern void distribute_restrictinfo_to_rels(PlannerInfo *root,
RestrictInfo *restrictinfo);
extern void process_implied_equality(PlannerInfo *root,
Node *item1, Node *item2,
Oid sortop1, Oid sortop2,
Relids item1_relids, Relids item2_relids,
bool delete_it);
Oid opno,
Expr *item1,
Expr *item2,
Relids qualscope,
bool below_outer_join,
bool both_const);
extern RestrictInfo *build_implied_join_equality(Oid opno,
Expr *item1,
Expr *item2,
Relids qualscope);
/*
* prototypes for plan/setrefs.c

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/optimizer/restrictinfo.h,v 1.39 2007/01/05 22:19:56 momjian Exp $
* $PostgreSQL: pgsql/src/include/optimizer/restrictinfo.h,v 1.40 2007/01/20 20:45:40 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -32,12 +32,8 @@ extern List *extract_actual_clauses(List *restrictinfo_list,
extern void extract_actual_join_clauses(List *restrictinfo_list,
List **joinquals,
List **otherquals);
extern List *remove_redundant_join_clauses(PlannerInfo *root,
List *restrictinfo_list,
bool isouterjoin);
extern List *select_nonredundant_join_clauses(PlannerInfo *root,
List *restrictinfo_list,
List *reference_list,
bool isouterjoin);
List *reference_list);
#endif /* RESTRICTINFO_H */

View File

@ -6,7 +6,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/utils/lsyscache.h,v 1.112 2007/01/10 18:06:05 tgl Exp $
* $PostgreSQL: pgsql/src/include/utils/lsyscache.h,v 1.113 2007/01/20 20:45:41 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@ -35,12 +35,11 @@ extern void get_op_opfamily_properties(Oid opno, Oid opfamily,
bool *recheck);
extern Oid get_opfamily_member(Oid opfamily, Oid lefttype, Oid righttype,
int16 strategy);
extern bool get_op_mergejoin_info(Oid eq_op, Oid *left_sortop,
Oid *right_sortop, Oid *opfamily);
extern bool get_compare_function_for_ordering_op(Oid opno,
Oid *cmpfunc, bool *reverse);
extern Oid get_equality_op_for_ordering_op(Oid opno);
extern Oid get_ordering_op_for_equality_op(Oid opno, bool use_lhs_type);
extern List *get_mergejoin_opfamilies(Oid opno);
extern Oid get_compatible_hash_operator(Oid opno, bool use_lhs_type);
extern Oid get_op_hash_function(Oid opno);
extern void get_op_btree_interpretation(Oid opno,