1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* pathnode.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* Routines to manipulate pathlists and create path nodes
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2016-01-02 19:33:40 +01:00
|
|
|
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/optimizer/util/pathnode.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
#include <math.h>
|
|
|
|
|
2003-01-22 01:07:00 +01:00
|
|
|
#include "miscadmin.h"
|
2010-10-31 02:55:20 +01:00
|
|
|
#include "nodes/nodeFuncs.h"
|
2008-08-14 20:48:00 +02:00
|
|
|
#include "optimizer/clauses.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
#include "optimizer/cost.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "optimizer/pathnode.h"
|
1999-07-27 05:51:11 +02:00
|
|
|
#include "optimizer/paths.h"
|
2014-07-16 03:12:43 +02:00
|
|
|
#include "optimizer/planmain.h"
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
#include "optimizer/restrictinfo.h"
|
2014-07-16 03:12:43 +02:00
|
|
|
#include "optimizer/var.h"
|
2004-01-05 19:04:39 +01:00
|
|
|
#include "parser/parsetree.h"
|
2007-01-10 19:06:05 +01:00
|
|
|
#include "utils/lsyscache.h"
|
2011-10-08 02:13:02 +02:00
|
|
|
#include "utils/selfuncs.h"
|
2003-01-22 01:07:00 +01:00
|
|
|
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
typedef enum
|
|
|
|
{
|
|
|
|
COSTS_EQUAL, /* path costs are fuzzily equal */
|
|
|
|
COSTS_BETTER1, /* first path is cheaper than second */
|
|
|
|
COSTS_BETTER2, /* second path is cheaper than first */
|
|
|
|
COSTS_DIFFERENT /* neither path dominates the other on cost */
|
|
|
|
} PathCostComparison;
|
|
|
|
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
/*
|
|
|
|
* STD_FUZZ_FACTOR is the normal fuzz factor for compare_path_costs_fuzzily.
|
|
|
|
* XXX is it worth making this user-controllable? It provides a tradeoff
|
|
|
|
* between planner runtime and the accuracy of path cost comparisons.
|
|
|
|
*/
|
|
|
|
#define STD_FUZZ_FACTOR 1.01
|
|
|
|
|
2005-07-15 19:09:26 +02:00
|
|
|
static List *translate_sub_tlist(List *tlist, int relid);
|
1996-07-09 08:22:35 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*****************************************************************************
|
1997-09-07 07:04:48 +02:00
|
|
|
* MISC. PATH UTILITIES
|
1996-07-09 08:22:35 +02:00
|
|
|
*****************************************************************************/
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2000-02-15 21:49:31 +01:00
|
|
|
* compare_path_costs
|
|
|
|
* Return -1, 0, or +1 according as path1 is cheaper, the same cost,
|
|
|
|
* or more expensive than path2 for the specified criterion.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
compare_path_costs(Path *path1, Path *path2, CostSelector criterion)
|
|
|
|
{
|
|
|
|
if (criterion == STARTUP_COST)
|
|
|
|
{
|
|
|
|
if (path1->startup_cost < path2->startup_cost)
|
|
|
|
return -1;
|
|
|
|
if (path1->startup_cost > path2->startup_cost)
|
|
|
|
return +1;
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2000-02-15 21:49:31 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* If paths have the same startup cost (not at all unlikely), order
|
|
|
|
* them by total cost.
|
2000-02-15 21:49:31 +01:00
|
|
|
*/
|
|
|
|
if (path1->total_cost < path2->total_cost)
|
|
|
|
return -1;
|
|
|
|
if (path1->total_cost > path2->total_cost)
|
|
|
|
return +1;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
if (path1->total_cost < path2->total_cost)
|
|
|
|
return -1;
|
|
|
|
if (path1->total_cost > path2->total_cost)
|
|
|
|
return +1;
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2000-02-15 21:49:31 +01:00
|
|
|
/*
|
|
|
|
* If paths have the same total cost, order them by startup cost.
|
|
|
|
*/
|
|
|
|
if (path1->startup_cost < path2->startup_cost)
|
|
|
|
return -1;
|
|
|
|
if (path1->startup_cost > path2->startup_cost)
|
|
|
|
return +1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* compare_path_fractional_costs
|
|
|
|
* Return -1, 0, or +1 according as path1 is cheaper, the same cost,
|
|
|
|
* or more expensive than path2 for fetching the specified fraction
|
|
|
|
* of the total tuples.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
2000-02-15 21:49:31 +01:00
|
|
|
* If fraction is <= 0 or > 1, we interpret it as 1, ie, we select the
|
|
|
|
* path with the cheaper total_cost.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-02-15 21:49:31 +01:00
|
|
|
int
|
|
|
|
compare_fractional_path_costs(Path *path1, Path *path2,
|
|
|
|
double fraction)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2000-02-15 21:49:31 +01:00
|
|
|
Cost cost1,
|
|
|
|
cost2;
|
|
|
|
|
|
|
|
if (fraction <= 0.0 || fraction >= 1.0)
|
|
|
|
return compare_path_costs(path1, path2, TOTAL_COST);
|
|
|
|
cost1 = path1->startup_cost +
|
|
|
|
fraction * (path1->total_cost - path1->startup_cost);
|
|
|
|
cost2 = path2->startup_cost +
|
|
|
|
fraction * (path2->total_cost - path2->startup_cost);
|
|
|
|
if (cost1 < cost2)
|
|
|
|
return -1;
|
|
|
|
if (cost1 > cost2)
|
|
|
|
return +1;
|
|
|
|
return 0;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* compare_path_costs_fuzzily
|
|
|
|
* Compare the costs of two paths to see if either can be said to
|
|
|
|
* dominate the other.
|
|
|
|
*
|
|
|
|
* We use fuzzy comparisons so that add_path() can avoid keeping both of
|
|
|
|
* a pair of paths that really have insignificantly different cost.
|
2012-04-21 06:51:14 +02:00
|
|
|
*
|
|
|
|
* The fuzz_factor argument must be 1.0 plus delta, where delta is the
|
|
|
|
* fraction of the smaller cost that is considered to be a significant
|
2014-05-06 18:12:18 +02:00
|
|
|
* difference. For example, fuzz_factor = 1.01 makes the fuzziness limit
|
2012-04-21 06:51:14 +02:00
|
|
|
* be 1% of the smaller cost.
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
|
|
|
* The two paths are said to have "equal" costs if both startup and total
|
2014-05-06 18:12:18 +02:00
|
|
|
* costs are fuzzily the same. Path1 is said to be better than path2 if
|
2012-01-28 01:26:38 +01:00
|
|
|
* it has fuzzily better startup cost and fuzzily no worse total cost,
|
|
|
|
* or if it has fuzzily better total cost and fuzzily no worse startup cost.
|
|
|
|
* Path2 is better than path1 if the reverse holds. Finally, if one path
|
|
|
|
* is fuzzily better than the other on startup cost and fuzzily worse on
|
|
|
|
* total cost, we just say that their costs are "different", since neither
|
|
|
|
* dominates the other across the whole performance spectrum.
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
*
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
* This function also enforces a policy rule that paths for which the relevant
|
|
|
|
* one of parent->consider_startup and parent->consider_param_startup is false
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
* cannot survive comparisons solely on the grounds of good startup cost, so
|
|
|
|
* we never return COSTS_DIFFERENT when that is true for the total-cost loser.
|
|
|
|
* (But if total costs are fuzzily equal, we compare startup costs anyway,
|
|
|
|
* in hopes of eliminating one path or the other.)
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
|
|
|
static PathCostComparison
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
#define CONSIDER_PATH_STARTUP_COST(p) \
|
|
|
|
((p)->param_info == NULL ? (p)->parent->consider_startup : (p)->parent->consider_param_startup)
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* Check total cost first since it's more likely to be different; many
|
|
|
|
* paths have zero startup cost.
|
|
|
|
*/
|
2012-04-21 06:51:14 +02:00
|
|
|
if (path1->total_cost > path2->total_cost * fuzz_factor)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
|
|
|
/* path1 fuzzily worse on total cost */
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
if (CONSIDER_PATH_STARTUP_COST(path1) &&
|
|
|
|
path2->startup_cost > path1->startup_cost * fuzz_factor)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
|
|
|
/* ... but path2 fuzzily worse on startup, so DIFFERENT */
|
|
|
|
return COSTS_DIFFERENT;
|
|
|
|
}
|
|
|
|
/* else path2 dominates */
|
|
|
|
return COSTS_BETTER2;
|
|
|
|
}
|
2012-04-21 06:51:14 +02:00
|
|
|
if (path2->total_cost > path1->total_cost * fuzz_factor)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
|
|
|
/* path2 fuzzily worse on total cost */
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
if (CONSIDER_PATH_STARTUP_COST(path2) &&
|
|
|
|
path1->startup_cost > path2->startup_cost * fuzz_factor)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
|
|
|
/* ... but path1 fuzzily worse on startup, so DIFFERENT */
|
|
|
|
return COSTS_DIFFERENT;
|
|
|
|
}
|
|
|
|
/* else path1 dominates */
|
|
|
|
return COSTS_BETTER1;
|
|
|
|
}
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
/* fuzzily the same on total cost ... */
|
|
|
|
if (path1->startup_cost > path2->startup_cost * fuzz_factor)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
|
|
|
/* ... but path1 fuzzily worse on startup, so path2 wins */
|
|
|
|
return COSTS_BETTER2;
|
|
|
|
}
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
if (path2->startup_cost > path1->startup_cost * fuzz_factor)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
|
|
|
/* ... but path2 fuzzily worse on startup, so path1 wins */
|
|
|
|
return COSTS_BETTER1;
|
|
|
|
}
|
|
|
|
/* fuzzily the same on both costs */
|
|
|
|
return COSTS_EQUAL;
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
|
|
|
|
#undef CONSIDER_PATH_STARTUP_COST
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* set_cheapest
|
2000-02-15 21:49:31 +01:00
|
|
|
* Find the minimum-cost paths from among a relation's paths,
|
|
|
|
* and save them in the rel's cheapest-path fields.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
* cheapest_total_path is normally the cheapest-total-cost unparameterized
|
|
|
|
* path; but if there are no unparameterized paths, we assign it to be the
|
2014-05-06 18:12:18 +02:00
|
|
|
* best (cheapest least-parameterized) parameterized path. However, only
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
* unparameterized paths are considered candidates for cheapest_startup_path,
|
|
|
|
* so that will be NULL if there are no unparameterized paths.
|
|
|
|
*
|
|
|
|
* The cheapest_parameterized_paths list collects all parameterized paths
|
2014-05-06 18:12:18 +02:00
|
|
|
* that have survived the add_path() tournament for this relation. (Since
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
* add_path ignores pathkeys for a parameterized path, these will be paths
|
2016-01-20 20:29:22 +01:00
|
|
|
* that have best cost or best row count for their parameterization. We
|
|
|
|
* may also have both a parallel-safe and a non-parallel-safe path in some
|
|
|
|
* cases for the same parameterization in some cases, but this should be
|
|
|
|
* relatively rare since, most typically, all paths for the same relation
|
|
|
|
* will be parallel-safe or none of them will.)
|
|
|
|
*
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
* cheapest_parameterized_paths always includes the cheapest-total
|
|
|
|
* unparameterized path, too, if there is one; the users of that list find
|
|
|
|
* it more convenient if that's included.
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
2000-02-15 21:49:31 +01:00
|
|
|
* This is normally called only after we've finished constructing the path
|
|
|
|
* list for the rel node.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-02-15 21:49:31 +01:00
|
|
|
void
|
|
|
|
set_cheapest(RelOptInfo *parent_rel)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2000-02-15 21:49:31 +01:00
|
|
|
Path *cheapest_startup_path;
|
|
|
|
Path *cheapest_total_path;
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
Path *best_param_path;
|
|
|
|
List *parameterized_paths;
|
2012-01-28 01:26:38 +01:00
|
|
|
ListCell *p;
|
1996-07-09 08:22:35 +02:00
|
|
|
|
1998-07-18 06:22:52 +02:00
|
|
|
Assert(IsA(parent_rel, RelOptInfo));
|
2000-09-12 23:07:18 +02:00
|
|
|
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
if (parent_rel->pathlist == NIL)
|
|
|
|
elog(ERROR, "could not devise a query plan for the given query");
|
|
|
|
|
|
|
|
cheapest_startup_path = cheapest_total_path = best_param_path = NULL;
|
|
|
|
parameterized_paths = NIL;
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
foreach(p, parent_rel->pathlist)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
Path *path = (Path *) lfirst(p);
|
2000-02-15 21:49:31 +01:00
|
|
|
int cmp;
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if (path->param_info)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
/* Parameterized path, so add it to parameterized_paths */
|
|
|
|
parameterized_paths = lappend(parameterized_paths, path);
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
/*
|
|
|
|
* If we have an unparameterized cheapest-total, we no longer care
|
|
|
|
* about finding the best parameterized path, so move on.
|
|
|
|
*/
|
|
|
|
if (cheapest_total_path)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Otherwise, track the best parameterized path, which is the one
|
|
|
|
* with least total cost among those of the minimum
|
|
|
|
* parameterization.
|
|
|
|
*/
|
|
|
|
if (best_param_path == NULL)
|
|
|
|
best_param_path = path;
|
|
|
|
else
|
|
|
|
{
|
|
|
|
switch (bms_subset_compare(PATH_REQ_OUTER(path),
|
|
|
|
PATH_REQ_OUTER(best_param_path)))
|
|
|
|
{
|
|
|
|
case BMS_EQUAL:
|
|
|
|
/* keep the cheaper one */
|
|
|
|
if (compare_path_costs(path, best_param_path,
|
|
|
|
TOTAL_COST) < 0)
|
|
|
|
best_param_path = path;
|
|
|
|
break;
|
|
|
|
case BMS_SUBSET1:
|
|
|
|
/* new path is less-parameterized */
|
|
|
|
best_param_path = path;
|
|
|
|
break;
|
|
|
|
case BMS_SUBSET2:
|
|
|
|
/* old path is less-parameterized, keep it */
|
|
|
|
break;
|
|
|
|
case BMS_DIFFERENT:
|
2013-05-29 22:58:43 +02:00
|
|
|
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
/*
|
|
|
|
* This means that neither path has the least possible
|
|
|
|
* parameterization for the rel. We'll sit on the old
|
|
|
|
* path until something better comes along.
|
|
|
|
*/
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Unparameterized path, so consider it for cheapest slots */
|
|
|
|
if (cheapest_total_path == NULL)
|
|
|
|
{
|
|
|
|
cheapest_startup_path = cheapest_total_path = path;
|
|
|
|
continue;
|
|
|
|
}
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
/*
|
|
|
|
* If we find two paths of identical costs, try to keep the
|
|
|
|
* better-sorted one. The paths might have unrelated sort
|
|
|
|
* orderings, in which case we can only guess which might be
|
|
|
|
* better to keep, but if one is superior then we definitely
|
|
|
|
* should keep that one.
|
|
|
|
*/
|
|
|
|
cmp = compare_path_costs(cheapest_startup_path, path, STARTUP_COST);
|
|
|
|
if (cmp > 0 ||
|
|
|
|
(cmp == 0 &&
|
|
|
|
compare_pathkeys(cheapest_startup_path->pathkeys,
|
|
|
|
path->pathkeys) == PATHKEYS_BETTER2))
|
|
|
|
cheapest_startup_path = path;
|
|
|
|
|
|
|
|
cmp = compare_path_costs(cheapest_total_path, path, TOTAL_COST);
|
|
|
|
if (cmp > 0 ||
|
|
|
|
(cmp == 0 &&
|
|
|
|
compare_pathkeys(cheapest_total_path->pathkeys,
|
|
|
|
path->pathkeys) == PATHKEYS_BETTER2))
|
|
|
|
cheapest_total_path = path;
|
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
/* Add cheapest unparameterized path, if any, to parameterized_paths */
|
|
|
|
if (cheapest_total_path)
|
|
|
|
parameterized_paths = lcons(cheapest_total_path, parameterized_paths);
|
|
|
|
|
|
|
|
/*
|
2013-05-29 22:58:43 +02:00
|
|
|
* If there is no unparameterized path, use the best parameterized path as
|
|
|
|
* cheapest_total_path (but not as cheapest_startup_path).
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
*/
|
|
|
|
if (cheapest_total_path == NULL)
|
|
|
|
cheapest_total_path = best_param_path;
|
|
|
|
Assert(cheapest_total_path != NULL);
|
2012-01-28 01:26:38 +01:00
|
|
|
|
2000-02-15 21:49:31 +01:00
|
|
|
parent_rel->cheapest_startup_path = cheapest_startup_path;
|
|
|
|
parent_rel->cheapest_total_path = cheapest_total_path;
|
2003-08-04 02:43:34 +02:00
|
|
|
parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
parent_rel->cheapest_parameterized_paths = parameterized_paths;
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* add_path
|
|
|
|
* Consider a potential implementation path for the specified parent rel,
|
|
|
|
* and add it to the rel's pathlist if it is worthy of consideration.
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* A path is worthy if it has a better sort order (better pathkeys) or
|
|
|
|
* cheaper cost (on either dimension), or generates fewer rows, than any
|
|
|
|
* existing path that has the same or superset parameterization rels.
|
2016-01-20 20:29:22 +01:00
|
|
|
* We also consider parallel-safe paths more worthy than others.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
2004-04-25 20:23:57 +02:00
|
|
|
* We also remove from the rel's pathlist any old paths that are dominated
|
2012-01-28 01:26:38 +01:00
|
|
|
* by new_path --- that is, new_path is cheaper, at least as well ordered,
|
2016-01-20 20:29:22 +01:00
|
|
|
* generates no more rows, requires no outer rels not required by the old
|
|
|
|
* path, and is no less parallel-safe.
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
*
|
|
|
|
* In most cases, a path with a superset parameterization will generate
|
|
|
|
* fewer rows (since it has more join clauses to apply), so that those two
|
|
|
|
* figures of merit move in opposite directions; this means that a path of
|
|
|
|
* one parameterization can seldom dominate a path of another. But such
|
|
|
|
* cases do arise, so we make the full set of checks anyway.
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
* There are two policy decisions embedded in this function, along with
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
* its sibling add_path_precheck. First, we treat all parameterized paths
|
|
|
|
* as having NIL pathkeys, so that they cannot win comparisons on the
|
|
|
|
* basis of sort order. This is to reduce the number of parameterized
|
|
|
|
* paths that are kept; see discussion in src/backend/optimizer/README.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
* Second, we only consider cheap startup cost to be interesting if
|
|
|
|
* parent_rel->consider_startup is true for an unparameterized path, or
|
|
|
|
* parent_rel->consider_param_startup is true for a parameterized one.
|
|
|
|
* Again, this allows discarding useless paths sooner.
|
2012-09-02 00:16:24 +02:00
|
|
|
*
|
2012-01-28 01:26:38 +01:00
|
|
|
* The pathlist is kept sorted by total_cost, with cheaper paths
|
|
|
|
* at the front. Within this routine, that's simply a speed hack:
|
|
|
|
* doing it that way makes it more likely that we will reject an inferior
|
|
|
|
* path after a few comparisons, rather than many comparisons.
|
|
|
|
* However, add_path_precheck relies on this ordering to exit early
|
|
|
|
* when possible.
|
2000-12-14 23:30:45 +01:00
|
|
|
*
|
2000-02-15 21:49:31 +01:00
|
|
|
* NOTE: discarded Path objects are immediately pfree'd to reduce planner
|
|
|
|
* memory consumption. We dare not try to free the substructure of a Path,
|
|
|
|
* since much of it may be shared with other Paths or the query tree itself;
|
|
|
|
* but just recycling discarded Path nodes is a very useful savings in
|
2000-12-14 23:30:45 +01:00
|
|
|
* a large join tree. We can recycle the List nodes of pathlist, too.
|
2000-02-15 21:49:31 +01:00
|
|
|
*
|
2005-04-22 23:58:32 +02:00
|
|
|
* BUT: we do not pfree IndexPath objects, since they may be referenced as
|
|
|
|
* children of BitmapHeapPaths as well as being paths in their own right.
|
|
|
|
*
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'parent_rel' is the relation entry to which the path corresponds.
|
|
|
|
* 'new_path' is a potential path for parent_rel.
|
1999-08-16 04:17:58 +02:00
|
|
|
*
|
2000-02-07 05:41:04 +01:00
|
|
|
* Returns nothing, but modifies parent_rel->pathlist.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-02-07 05:41:04 +01:00
|
|
|
void
|
|
|
|
add_path(RelOptInfo *parent_rel, Path *new_path)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2005-10-15 04:49:52 +02:00
|
|
|
bool accept_new = true; /* unless we find a superior old path */
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *insert_after = NULL; /* where to insert new item */
|
2012-01-28 01:26:38 +01:00
|
|
|
List *new_path_pathkeys;
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *p1;
|
2011-03-13 17:57:14 +01:00
|
|
|
ListCell *p1_prev;
|
|
|
|
ListCell *p1_next;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2005-06-03 21:00:12 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* This is a convenient place to check for query cancel --- no part of the
|
|
|
|
* planner goes very long without calling add_path().
|
2005-06-03 21:00:12 +02:00
|
|
|
*/
|
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/* Pretend parameterized paths have no pathkeys, per comment above */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
new_path_pathkeys = new_path->param_info ? NIL : new_path->pathkeys;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Loop to check proposed new path against old paths. Note it is possible
|
|
|
|
* for more than one old path to be tossed out because new_path dominates
|
|
|
|
* it.
|
2011-03-13 17:57:14 +01:00
|
|
|
*
|
|
|
|
* We can't use foreach here because the loop body may delete the current
|
|
|
|
* list cell.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2011-03-13 17:57:14 +01:00
|
|
|
p1_prev = NULL;
|
|
|
|
for (p1 = list_head(parent_rel->pathlist); p1 != NULL; p1 = p1_next)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
2000-02-07 05:41:04 +01:00
|
|
|
Path *old_path = (Path *) lfirst(p1);
|
2000-04-12 19:17:23 +02:00
|
|
|
bool remove_old = false; /* unless new proves superior */
|
2012-01-28 01:26:38 +01:00
|
|
|
PathCostComparison costcmp;
|
|
|
|
PathKeysComparison keyscmp;
|
|
|
|
BMS_Comparison outercmp;
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2011-03-13 17:57:14 +01:00
|
|
|
p1_next = lnext(p1);
|
|
|
|
|
2012-04-21 06:51:14 +02:00
|
|
|
/*
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
* Do a fuzzy cost comparison with standard fuzziness limit.
|
2012-04-21 06:51:14 +02:00
|
|
|
*/
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
costcmp = compare_path_costs_fuzzily(new_path, old_path,
|
|
|
|
STD_FUZZ_FACTOR);
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2000-02-15 21:49:31 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* If the two paths compare differently for startup and total cost,
|
2012-01-28 01:26:38 +01:00
|
|
|
* then we want to keep both, and we can skip comparing pathkeys and
|
|
|
|
* required_outer rels. If they compare the same, proceed with the
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* other comparisons. Row count is checked last. (We make the tests
|
|
|
|
* in this order because the cost comparison is most likely to turn
|
|
|
|
* out "different", and the pathkeys comparison next most likely. As
|
|
|
|
* explained above, row count very seldom makes a difference, so even
|
|
|
|
* though it's cheap to compare there's not much point in checking it
|
|
|
|
* earlier.)
|
2000-02-15 21:49:31 +01:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
if (costcmp != COSTS_DIFFERENT)
|
1999-02-09 04:51:42 +01:00
|
|
|
{
|
2012-01-28 01:26:38 +01:00
|
|
|
/* Similarly check to see if either dominates on pathkeys */
|
|
|
|
List *old_path_pathkeys;
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
old_path_pathkeys = old_path->param_info ? NIL : old_path->pathkeys;
|
2012-01-28 01:26:38 +01:00
|
|
|
keyscmp = compare_pathkeys(new_path_pathkeys,
|
|
|
|
old_path_pathkeys);
|
|
|
|
if (keyscmp != PATHKEYS_DIFFERENT)
|
2000-02-15 21:49:31 +01:00
|
|
|
{
|
2012-01-28 01:26:38 +01:00
|
|
|
switch (costcmp)
|
|
|
|
{
|
|
|
|
case COSTS_EQUAL:
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path),
|
2012-06-10 21:20:04 +02:00
|
|
|
PATH_REQ_OUTER(old_path));
|
2012-01-28 01:26:38 +01:00
|
|
|
if (keyscmp == PATHKEYS_BETTER1)
|
|
|
|
{
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if ((outercmp == BMS_EQUAL ||
|
|
|
|
outercmp == BMS_SUBSET1) &&
|
2016-01-20 20:29:22 +01:00
|
|
|
new_path->rows <= old_path->rows &&
|
|
|
|
new_path->parallel_safe >= old_path->parallel_safe)
|
2012-06-10 21:20:04 +02:00
|
|
|
remove_old = true; /* new dominates old */
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
else if (keyscmp == PATHKEYS_BETTER2)
|
|
|
|
{
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if ((outercmp == BMS_EQUAL ||
|
|
|
|
outercmp == BMS_SUBSET2) &&
|
2016-01-20 20:29:22 +01:00
|
|
|
new_path->rows >= old_path->rows &&
|
|
|
|
new_path->parallel_safe <= old_path->parallel_safe)
|
2012-06-10 21:20:04 +02:00
|
|
|
accept_new = false; /* old dominates new */
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
else /* keyscmp == PATHKEYS_EQUAL */
|
|
|
|
{
|
|
|
|
if (outercmp == BMS_EQUAL)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Same pathkeys and outer rels, and fuzzily
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* the same cost, so keep just one; to decide
|
2016-01-20 20:29:22 +01:00
|
|
|
* which, first check parallel-safety, then
|
|
|
|
* rows, then do a fuzzy cost comparison with
|
|
|
|
* very small fuzz limit. (We used to do an
|
|
|
|
* exact cost comparison, but that results in
|
|
|
|
* annoying platform-specific plan variations
|
|
|
|
* due to roundoff in the cost estimates.) If
|
|
|
|
* things are still tied, arbitrarily keep
|
|
|
|
* only the old path. Notice that we will
|
|
|
|
* keep only the old path even if the
|
|
|
|
* less-fuzzy comparison decides the startup
|
|
|
|
* and total costs compare differently.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
2016-01-20 20:29:22 +01:00
|
|
|
if (new_path->parallel_safe >
|
|
|
|
old_path->parallel_safe)
|
|
|
|
remove_old = true; /* new dominates old */
|
|
|
|
else if (new_path->parallel_safe <
|
|
|
|
old_path->parallel_safe)
|
|
|
|
accept_new = false; /* old dominates new */
|
|
|
|
else if (new_path->rows < old_path->rows)
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
remove_old = true; /* new dominates old */
|
|
|
|
else if (new_path->rows > old_path->rows)
|
2012-06-10 21:20:04 +02:00
|
|
|
accept_new = false; /* old dominates new */
|
2012-09-02 00:16:24 +02:00
|
|
|
else if (compare_path_costs_fuzzily(new_path,
|
|
|
|
old_path,
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
1.0000000001) == COSTS_BETTER1)
|
2012-01-28 01:26:38 +01:00
|
|
|
remove_old = true; /* new dominates old */
|
|
|
|
else
|
2012-06-10 21:20:04 +02:00
|
|
|
accept_new = false; /* old equals or
|
|
|
|
* dominates new */
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
else if (outercmp == BMS_SUBSET1 &&
|
2016-01-20 20:29:22 +01:00
|
|
|
new_path->rows <= old_path->rows &&
|
|
|
|
new_path->parallel_safe >= old_path->parallel_safe)
|
2012-06-10 21:20:04 +02:00
|
|
|
remove_old = true; /* new dominates old */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
else if (outercmp == BMS_SUBSET2 &&
|
2016-01-20 20:29:22 +01:00
|
|
|
new_path->rows >= old_path->rows &&
|
|
|
|
new_path->parallel_safe <= old_path->parallel_safe)
|
2012-06-10 21:20:04 +02:00
|
|
|
accept_new = false; /* old dominates new */
|
2012-01-28 01:26:38 +01:00
|
|
|
/* else different parameterizations, keep both */
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case COSTS_BETTER1:
|
|
|
|
if (keyscmp != PATHKEYS_BETTER2)
|
|
|
|
{
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path),
|
2012-06-10 21:20:04 +02:00
|
|
|
PATH_REQ_OUTER(old_path));
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if ((outercmp == BMS_EQUAL ||
|
|
|
|
outercmp == BMS_SUBSET1) &&
|
2016-01-20 20:29:22 +01:00
|
|
|
new_path->rows <= old_path->rows &&
|
|
|
|
new_path->parallel_safe >= old_path->parallel_safe)
|
2012-06-10 21:20:04 +02:00
|
|
|
remove_old = true; /* new dominates old */
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case COSTS_BETTER2:
|
|
|
|
if (keyscmp != PATHKEYS_BETTER1)
|
|
|
|
{
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path),
|
2012-06-10 21:20:04 +02:00
|
|
|
PATH_REQ_OUTER(old_path));
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if ((outercmp == BMS_EQUAL ||
|
|
|
|
outercmp == BMS_SUBSET2) &&
|
2016-01-20 20:29:22 +01:00
|
|
|
new_path->rows >= old_path->rows &&
|
|
|
|
new_path->parallel_safe <= old_path->parallel_safe)
|
2012-06-10 21:20:04 +02:00
|
|
|
accept_new = false; /* old dominates new */
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case COSTS_DIFFERENT:
|
2012-06-10 21:20:04 +02:00
|
|
|
|
2004-03-29 21:58:04 +02:00
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* can't get here, but keep this case to keep compiler
|
|
|
|
* quiet
|
2004-03-29 21:58:04 +02:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
break;
|
|
|
|
}
|
2000-02-15 21:49:31 +01:00
|
|
|
}
|
1999-08-16 04:17:58 +02:00
|
|
|
}
|
1999-02-11 22:05:28 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2004-04-25 20:23:57 +02:00
|
|
|
* Remove current element from pathlist if dominated by new.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2004-04-25 20:23:57 +02:00
|
|
|
if (remove_old)
|
1999-08-16 04:17:58 +02:00
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
parent_rel->pathlist = list_delete_cell(parent_rel->pathlist,
|
|
|
|
p1, p1_prev);
|
2005-10-15 04:49:52 +02:00
|
|
|
|
2005-04-22 23:58:32 +02:00
|
|
|
/*
|
|
|
|
* Delete the data pointed-to by the deleted cell, if possible
|
|
|
|
*/
|
|
|
|
if (!IsA(old_path, IndexPath))
|
|
|
|
pfree(old_path);
|
2011-03-13 17:57:14 +01:00
|
|
|
/* p1_prev does not advance */
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
2000-02-07 05:41:04 +01:00
|
|
|
else
|
2000-12-14 23:30:45 +01:00
|
|
|
{
|
|
|
|
/* new belongs after this old path if it has cost >= old's */
|
2012-01-28 01:26:38 +01:00
|
|
|
if (new_path->total_cost >= old_path->total_cost)
|
2000-12-14 23:30:45 +01:00
|
|
|
insert_after = p1;
|
2011-03-13 17:57:14 +01:00
|
|
|
/* p1_prev advances */
|
2000-02-07 05:41:04 +01:00
|
|
|
p1_prev = p1;
|
2000-12-14 23:30:45 +01:00
|
|
|
}
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we found an old path that dominates new_path, we can quit
|
|
|
|
* scanning the pathlist; we will not add new_path, and we assume
|
|
|
|
* new_path cannot dominate any other elements of the pathlist.
|
|
|
|
*/
|
2000-04-12 19:17:23 +02:00
|
|
|
if (!accept_new)
|
2000-02-07 05:41:04 +01:00
|
|
|
break;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
if (accept_new)
|
|
|
|
{
|
2000-12-14 23:30:45 +01:00
|
|
|
/* Accept the new path: insert it at proper place in pathlist */
|
|
|
|
if (insert_after)
|
2004-05-26 06:41:50 +02:00
|
|
|
lappend_cell(parent_rel->pathlist, insert_after, new_path);
|
2000-12-14 23:30:45 +01:00
|
|
|
else
|
|
|
|
parent_rel->pathlist = lcons(new_path, parent_rel->pathlist);
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
2000-02-15 21:49:31 +01:00
|
|
|
else
|
|
|
|
{
|
2000-12-14 23:30:45 +01:00
|
|
|
/* Reject and recycle the new path */
|
2005-04-22 23:58:32 +02:00
|
|
|
if (!IsA(new_path, IndexPath))
|
|
|
|
pfree(new_path);
|
2000-02-15 21:49:31 +01:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* add_path_precheck
|
|
|
|
* Check whether a proposed new path could possibly get accepted.
|
|
|
|
* We assume we know the path's pathkeys and parameterization accurately,
|
|
|
|
* and have lower bounds for its costs.
|
|
|
|
*
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* Note that we do not know the path's rowcount, since getting an estimate for
|
2014-05-06 18:12:18 +02:00
|
|
|
* that is too expensive to do before prechecking. We assume here that paths
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* of a superset parameterization will generate fewer rows; if that holds,
|
|
|
|
* then paths with different parameterizations cannot dominate each other
|
|
|
|
* and so we can simply ignore existing paths of another parameterization.
|
|
|
|
* (In the infrequent cases where that rule of thumb fails, add_path will
|
|
|
|
* get rid of the inferior path.)
|
|
|
|
*
|
2012-01-28 01:26:38 +01:00
|
|
|
* At the time this is called, we haven't actually built a Path structure,
|
|
|
|
* so the required information has to be passed piecemeal.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
add_path_precheck(RelOptInfo *parent_rel,
|
|
|
|
Cost startup_cost, Cost total_cost,
|
|
|
|
List *pathkeys, Relids required_outer)
|
|
|
|
{
|
|
|
|
List *new_path_pathkeys;
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
bool consider_startup;
|
2012-01-28 01:26:38 +01:00
|
|
|
ListCell *p1;
|
|
|
|
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
/* Pretend parameterized paths have no pathkeys, per add_path policy */
|
2012-01-28 01:26:38 +01:00
|
|
|
new_path_pathkeys = required_outer ? NIL : pathkeys;
|
|
|
|
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
/* Decide whether new path's startup cost is interesting */
|
|
|
|
consider_startup = required_outer ? parent_rel->consider_param_startup : parent_rel->consider_startup;
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
foreach(p1, parent_rel->pathlist)
|
|
|
|
{
|
|
|
|
Path *old_path = (Path *) lfirst(p1);
|
|
|
|
PathKeysComparison keyscmp;
|
|
|
|
|
|
|
|
/*
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* We are looking for an old_path with the same parameterization (and
|
|
|
|
* by assumption the same rowcount) that dominates the new path on
|
|
|
|
* pathkeys as well as both cost metrics. If we find one, we can
|
|
|
|
* reject the new path.
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
* Cost comparisons here should match compare_path_costs_fuzzily.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
/* new path can win on startup cost only if consider_startup */
|
|
|
|
if (startup_cost > old_path->startup_cost * STD_FUZZ_FACTOR ||
|
|
|
|
!consider_startup)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons. (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)
Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so. The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins. It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive. Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.
Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths. In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on. That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized paths match.
These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c555fa6, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans. (There is in fact one minor change in plan choice
within the core regression tests.) Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-04 00:02:39 +02:00
|
|
|
/* new path loses on cost, so check pathkeys... */
|
2012-01-28 01:26:38 +01:00
|
|
|
List *old_path_pathkeys;
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
old_path_pathkeys = old_path->param_info ? NIL : old_path->pathkeys;
|
2012-01-28 01:26:38 +01:00
|
|
|
keyscmp = compare_pathkeys(new_path_pathkeys,
|
|
|
|
old_path_pathkeys);
|
|
|
|
if (keyscmp == PATHKEYS_EQUAL ||
|
|
|
|
keyscmp == PATHKEYS_BETTER2)
|
|
|
|
{
|
Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
2012-08-30 04:05:27 +02:00
|
|
|
/* new path does not win on pathkeys... */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if (bms_equal(required_outer, PATH_REQ_OUTER(old_path)))
|
|
|
|
{
|
|
|
|
/* Found an old path that dominates the new one */
|
2012-01-28 01:26:38 +01:00
|
|
|
return false;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
}
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2012-06-10 21:20:04 +02:00
|
|
|
* Since the pathlist is sorted by total_cost, we can stop looking
|
|
|
|
* once we reach a path with a total_cost larger than the new
|
|
|
|
* path's.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/*
|
|
|
|
* add_partial_path
|
|
|
|
* Like add_path, our goal here is to consider whether a path is worthy
|
|
|
|
* of being kept around, but the considerations here are a bit different.
|
|
|
|
* A partial path is one which can be executed in any number of workers in
|
|
|
|
* parallel such that each worker will generate a subset of the path's
|
|
|
|
* overall result.
|
|
|
|
*
|
|
|
|
* We don't generate parameterized partial paths for several reasons. Most
|
|
|
|
* importantly, they're not safe to execute, because there's nothing to
|
|
|
|
* make sure that a parallel scan within the parameterized portion of the
|
|
|
|
* plan is running with the same value in every worker at the same time.
|
|
|
|
* Fortunately, it seems unlikely to be worthwhile anyway, because having
|
|
|
|
* each worker scan the entire outer relation and a subset of the inner
|
|
|
|
* relation will generally be a terrible plan. The inner (parameterized)
|
|
|
|
* side of the plan will be small anyway. There could be rare cases where
|
|
|
|
* this wins big - e.g. if join order constraints put a 1-row relation on
|
|
|
|
* the outer side of the topmost join with a parameterized plan on the inner
|
|
|
|
* side - but we'll have to be content not to handle such cases until somebody
|
|
|
|
* builds an executor infrastructure that can cope with them.
|
|
|
|
*
|
|
|
|
* Because we don't consider parameterized paths here, we also don't
|
|
|
|
* need to consider the row counts as a measure of quality: every path will
|
|
|
|
* produce the same number of rows. Neither do we need to consider startup
|
|
|
|
* costs: parallelism is only used for plans that will be run to completion.
|
|
|
|
* Therefore, this routine is much simpler than add_path: it needs to
|
|
|
|
* consider only pathkeys and total cost.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
add_partial_path(RelOptInfo *parent_rel, Path *new_path)
|
|
|
|
{
|
|
|
|
bool accept_new = true; /* unless we find a superior old path */
|
|
|
|
ListCell *insert_after = NULL; /* where to insert new item */
|
|
|
|
ListCell *p1;
|
|
|
|
ListCell *p1_prev;
|
|
|
|
ListCell *p1_next;
|
|
|
|
|
|
|
|
/* Check for query cancel. */
|
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* As in add_path, throw out any paths which are dominated by the new
|
|
|
|
* path, but throw out the new path if some existing path dominates it.
|
|
|
|
*/
|
|
|
|
p1_prev = NULL;
|
|
|
|
for (p1 = list_head(parent_rel->partial_pathlist); p1 != NULL;
|
|
|
|
p1 = p1_next)
|
|
|
|
{
|
|
|
|
Path *old_path = (Path *) lfirst(p1);
|
|
|
|
bool remove_old = false; /* unless new proves superior */
|
|
|
|
PathKeysComparison keyscmp;
|
|
|
|
|
|
|
|
p1_next = lnext(p1);
|
|
|
|
|
|
|
|
/* Compare pathkeys. */
|
|
|
|
keyscmp = compare_pathkeys(new_path->pathkeys, old_path->pathkeys);
|
|
|
|
|
|
|
|
/* Unless pathkeys are incompable, keep just one of the two paths. */
|
|
|
|
if (keyscmp != PATHKEYS_DIFFERENT)
|
|
|
|
{
|
|
|
|
if (new_path->total_cost > old_path->total_cost * STD_FUZZ_FACTOR)
|
|
|
|
{
|
|
|
|
/* New path costs more; keep it only if pathkeys are better. */
|
|
|
|
if (keyscmp != PATHKEYS_BETTER1)
|
|
|
|
accept_new = false;
|
|
|
|
}
|
|
|
|
else if (old_path->total_cost > new_path->total_cost
|
|
|
|
* STD_FUZZ_FACTOR)
|
|
|
|
{
|
|
|
|
/* Old path costs more; keep it only if pathkeys are better. */
|
|
|
|
if (keyscmp != PATHKEYS_BETTER2)
|
|
|
|
remove_old = true;
|
|
|
|
}
|
|
|
|
else if (keyscmp == PATHKEYS_BETTER1)
|
|
|
|
{
|
|
|
|
/* Costs are about the same, new path has better pathkeys. */
|
|
|
|
remove_old = true;
|
|
|
|
}
|
|
|
|
else if (keyscmp == PATHKEYS_BETTER2)
|
|
|
|
{
|
|
|
|
/* Costs are about the same, old path has better pathkeys. */
|
|
|
|
accept_new = false;
|
|
|
|
}
|
|
|
|
else if (old_path->total_cost > new_path->total_cost * 1.0000000001)
|
|
|
|
{
|
|
|
|
/* Pathkeys are the same, and the old path costs more. */
|
|
|
|
remove_old = true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Pathkeys are the same, and new path isn't materially
|
|
|
|
* cheaper.
|
|
|
|
*/
|
|
|
|
accept_new = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove current element from partial_pathlist if dominated by new.
|
|
|
|
*/
|
|
|
|
if (remove_old)
|
|
|
|
{
|
|
|
|
parent_rel->partial_pathlist =
|
|
|
|
list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev);
|
|
|
|
/* add_path has a special case for IndexPath; we don't need it */
|
|
|
|
Assert(!IsA(old_path, IndexPath));
|
|
|
|
pfree(old_path);
|
|
|
|
/* p1_prev does not advance */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* new belongs after this old path if it has cost >= old's */
|
|
|
|
if (new_path->total_cost >= old_path->total_cost)
|
|
|
|
insert_after = p1;
|
|
|
|
/* p1_prev advances */
|
|
|
|
p1_prev = p1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we found an old path that dominates new_path, we can quit
|
|
|
|
* scanning the partial_pathlist; we will not add new_path, and we
|
|
|
|
* assume new_path cannot dominate any later path.
|
|
|
|
*/
|
|
|
|
if (!accept_new)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (accept_new)
|
|
|
|
{
|
|
|
|
/* Accept the new path: insert it at proper place */
|
|
|
|
if (insert_after)
|
|
|
|
lappend_cell(parent_rel->partial_pathlist, insert_after, new_path);
|
|
|
|
else
|
|
|
|
parent_rel->partial_pathlist =
|
|
|
|
lcons(new_path, parent_rel->partial_pathlist);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* add_path has a special case for IndexPath; we don't need it */
|
|
|
|
Assert(!IsA(new_path, IndexPath));
|
|
|
|
/* Reject and recycle the new path */
|
|
|
|
pfree(new_path);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* add_partial_path_precheck
|
|
|
|
* Check whether a proposed new partial path could possibly get accepted.
|
|
|
|
*
|
|
|
|
* Unlike add_path_precheck, we can ignore startup cost and parameterization,
|
|
|
|
* since they don't matter for partial paths (see add_partial_path). But
|
|
|
|
* we do want to make sure we don't add a partial path if there's already
|
|
|
|
* a complete path that dominates it, since in that case the proposed path
|
|
|
|
* is surely a loser.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
|
|
|
|
List *pathkeys)
|
|
|
|
{
|
|
|
|
ListCell *p1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Our goal here is twofold. First, we want to find out whether this path
|
|
|
|
* is clearly inferior to some existing partial path. If so, we want to
|
|
|
|
* reject it immediately. Second, we want to find out whether this path
|
|
|
|
* is clearly superior to some existing partial path -- at least, modulo
|
|
|
|
* final cost computations. If so, we definitely want to consider it.
|
|
|
|
*
|
|
|
|
* Unlike add_path(), we always compare pathkeys here. This is because we
|
|
|
|
* expect partial_pathlist to be very short, and getting a definitive
|
|
|
|
* answer at this stage avoids the need to call add_path_precheck.
|
|
|
|
*/
|
|
|
|
foreach(p1, parent_rel->partial_pathlist)
|
|
|
|
{
|
|
|
|
Path *old_path = (Path *) lfirst(p1);
|
|
|
|
PathKeysComparison keyscmp;
|
|
|
|
|
|
|
|
keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys);
|
|
|
|
if (keyscmp != PATHKEYS_DIFFERENT)
|
|
|
|
{
|
|
|
|
if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR &&
|
|
|
|
keyscmp != PATHKEYS_BETTER1)
|
|
|
|
return false;
|
|
|
|
if (old_path->total_cost > total_cost * STD_FUZZ_FACTOR &&
|
|
|
|
keyscmp != PATHKEYS_BETTER2)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This path is neither clearly inferior to an existing partial path nor
|
|
|
|
* clearly good enough that it might replace one. Compare it to
|
|
|
|
* non-parallel plans. If it loses even before accounting for the cost of
|
|
|
|
* the Gather node, we should definitely reject it.
|
|
|
|
*
|
|
|
|
* Note that we pass the total_cost to add_path_precheck twice. This is
|
|
|
|
* because it's never advantageous to consider the startup cost of a
|
|
|
|
* partial path; the resulting plans, if run in parallel, will be run to
|
|
|
|
* completion.
|
|
|
|
*/
|
|
|
|
if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
|
|
|
|
NULL))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
|
|
|
|
/*****************************************************************************
|
1997-09-07 07:04:48 +02:00
|
|
|
* PATH NODE CREATION ROUTINES
|
1996-07-09 08:22:35 +02:00
|
|
|
*****************************************************************************/
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* create_seqscan_path
|
1997-09-07 07:04:48 +02:00
|
|
|
* Creates a path corresponding to a sequential scan, returning the
|
|
|
|
* pathnode.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
1998-02-26 05:46:47 +01:00
|
|
|
Path *
|
2015-11-11 14:57:52 +01:00
|
|
|
create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
|
2016-01-20 20:29:22 +01:00
|
|
|
Relids required_outer, int parallel_degree)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
Path *pathnode = makeNode(Path);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
pathnode->pathtype = T_SeqScan;
|
|
|
|
pathnode->parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->parallel_aware = parallel_degree > 0 ? true : false;
|
|
|
|
pathnode->parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->parallel_degree = parallel_degree;
|
1999-08-16 04:17:58 +02:00
|
|
|
pathnode->pathkeys = NIL; /* seqscan has unordered result */
|
2000-02-15 21:49:31 +01:00
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
cost_seqscan(pathnode, root, rel, pathnode->param_info);
|
1999-07-27 05:51:11 +02:00
|
|
|
|
1998-09-01 05:29:17 +02:00
|
|
|
return pathnode;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2015-05-15 20:37:10 +02:00
|
|
|
/*
|
|
|
|
* create_samplescan_path
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
* Creates a path node for a sampled table scan.
|
2015-05-15 20:37:10 +02:00
|
|
|
*/
|
|
|
|
Path *
|
|
|
|
create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
|
|
|
|
{
|
2015-05-24 03:35:49 +02:00
|
|
|
Path *pathnode = makeNode(Path);
|
2015-05-15 20:37:10 +02:00
|
|
|
|
|
|
|
pathnode->pathtype = T_SampleScan;
|
|
|
|
pathnode->parent = rel;
|
|
|
|
pathnode->param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->parallel_degree = 0;
|
2015-05-15 20:37:10 +02:00
|
|
|
pathnode->pathkeys = NIL; /* samplescan has unordered result */
|
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
cost_samplescan(pathnode, root, rel, pathnode->param_info);
|
2015-05-15 20:37:10 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* create_index_path
|
1999-07-30 06:07:25 +02:00
|
|
|
* Creates a path node for an index scan.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
2005-03-27 08:29:49 +02:00
|
|
|
* 'index' is a usable index.
|
2011-12-25 01:03:21 +01:00
|
|
|
* 'indexclauses' is a list of RestrictInfo nodes representing clauses
|
1999-07-30 06:07:25 +02:00
|
|
|
* to be used as index qual conditions in the scan.
|
2011-12-25 01:03:21 +01:00
|
|
|
* 'indexclausecols' is an integer list of index column numbers (zero based)
|
|
|
|
* the indexclauses can be used with.
|
|
|
|
* 'indexorderbys' is a list of bare expressions (no RestrictInfos)
|
|
|
|
* to be used as index ordering operators in the scan.
|
|
|
|
* 'indexorderbycols' is an integer list of index column numbers (zero based)
|
|
|
|
* the ordering operators can be used with.
|
2000-12-14 23:30:45 +01:00
|
|
|
* 'pathkeys' describes the ordering of the path.
|
2000-02-15 21:49:31 +01:00
|
|
|
* 'indexscandir' is ForwardScanDirection or BackwardScanDirection
|
2000-12-14 23:30:45 +01:00
|
|
|
* for an ordered index, or NoMovementScanDirection for
|
2000-02-15 21:49:31 +01:00
|
|
|
* an unordered index.
|
2011-10-08 02:13:02 +02:00
|
|
|
* 'indexonly' is true if an index-only scan is wanted.
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* 'required_outer' is the set of outer relids for a parameterized path.
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'loop_count' is the number of repetitions of the indexscan to factor into
|
|
|
|
* estimates of caching behavior.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
1996-07-09 08:22:35 +02:00
|
|
|
* Returns the new path node.
|
|
|
|
*/
|
2001-10-25 07:50:21 +02:00
|
|
|
IndexPath *
|
2005-06-06 00:32:58 +02:00
|
|
|
create_index_path(PlannerInfo *root,
|
2000-01-09 01:26:47 +01:00
|
|
|
IndexOptInfo *index,
|
2011-12-25 01:03:21 +01:00
|
|
|
List *indexclauses,
|
|
|
|
List *indexclausecols,
|
2010-12-03 02:50:48 +01:00
|
|
|
List *indexorderbys,
|
2011-12-25 01:03:21 +01:00
|
|
|
List *indexorderbycols,
|
2000-12-14 23:30:45 +01:00
|
|
|
List *pathkeys,
|
2005-04-22 23:58:32 +02:00
|
|
|
ScanDirection indexscandir,
|
2011-10-08 02:13:02 +02:00
|
|
|
bool indexonly,
|
2012-01-28 01:26:38 +01:00
|
|
|
Relids required_outer,
|
|
|
|
double loop_count)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
IndexPath *pathnode = makeNode(IndexPath);
|
2005-04-22 23:58:32 +02:00
|
|
|
RelOptInfo *rel = index->rel;
|
|
|
|
List *indexquals,
|
2011-12-25 01:03:21 +01:00
|
|
|
*indexqualcols;
|
2005-04-22 23:58:32 +02:00
|
|
|
|
2011-10-11 20:20:06 +02:00
|
|
|
pathnode->path.pathtype = indexonly ? T_IndexOnlyScan : T_IndexScan;
|
2005-04-22 23:58:32 +02:00
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
2000-12-14 23:30:45 +01:00
|
|
|
pathnode->path.pathkeys = pathkeys;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2004-01-06 00:39:54 +01:00
|
|
|
/* Convert clauses to indexquals the executor can handle */
|
2011-12-25 01:03:21 +01:00
|
|
|
expand_indexqual_conditions(index, indexclauses, indexclausecols,
|
|
|
|
&indexquals, &indexqualcols);
|
2004-01-06 00:39:54 +01:00
|
|
|
|
2005-04-25 03:30:14 +02:00
|
|
|
/* Fill in the pathnode */
|
|
|
|
pathnode->indexinfo = index;
|
2011-12-25 01:03:21 +01:00
|
|
|
pathnode->indexclauses = indexclauses;
|
2005-04-25 03:30:14 +02:00
|
|
|
pathnode->indexquals = indexquals;
|
2011-12-25 01:03:21 +01:00
|
|
|
pathnode->indexqualcols = indexqualcols;
|
2010-12-03 02:50:48 +01:00
|
|
|
pathnode->indexorderbys = indexorderbys;
|
2011-12-25 01:03:21 +01:00
|
|
|
pathnode->indexorderbycols = indexorderbycols;
|
2000-02-15 21:49:31 +01:00
|
|
|
pathnode->indexscandir = indexscandir;
|
2000-03-22 23:08:35 +01:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
cost_index(pathnode, root, loop_count);
|
1999-07-26 01:07:26 +02:00
|
|
|
|
1998-09-01 05:29:17 +02:00
|
|
|
return pathnode;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2005-04-20 00:35:18 +02:00
|
|
|
/*
|
|
|
|
* create_bitmap_heap_path
|
|
|
|
* Creates a path node for a bitmap scan.
|
|
|
|
*
|
2005-04-21 21:18:13 +02:00
|
|
|
* 'bitmapqual' is a tree of IndexPath, BitmapAndPath, and BitmapOrPath nodes.
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* 'required_outer' is the set of outer relids for a parameterized path.
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'loop_count' is the number of repetitions of the indexscan to factor into
|
|
|
|
* estimates of caching behavior.
|
2006-06-06 19:59:58 +02:00
|
|
|
*
|
2012-01-28 01:26:38 +01:00
|
|
|
* loop_count should match the value used when creating the component
|
|
|
|
* IndexPaths.
|
2005-04-20 00:35:18 +02:00
|
|
|
*/
|
|
|
|
BitmapHeapPath *
|
2005-06-06 00:32:58 +02:00
|
|
|
create_bitmap_heap_path(PlannerInfo *root,
|
2005-04-20 00:35:18 +02:00
|
|
|
RelOptInfo *rel,
|
2005-04-22 23:58:32 +02:00
|
|
|
Path *bitmapqual,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Relids required_outer,
|
2012-01-28 01:26:38 +01:00
|
|
|
double loop_count)
|
2005-04-20 00:35:18 +02:00
|
|
|
{
|
|
|
|
BitmapHeapPath *pathnode = makeNode(BitmapHeapPath);
|
|
|
|
|
|
|
|
pathnode->path.pathtype = T_BitmapHeapScan;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = bitmapqual->parallel_safe;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
2005-10-15 04:49:52 +02:00
|
|
|
pathnode->path.pathkeys = NIL; /* always unordered */
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
pathnode->bitmapqual = bitmapqual;
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
cost_bitmap_heap_scan(&pathnode->path, root, rel,
|
|
|
|
pathnode->path.param_info,
|
|
|
|
bitmapqual, loop_count);
|
2005-04-21 21:18:13 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* create_bitmap_and_path
|
|
|
|
* Creates a path node representing a BitmapAnd.
|
|
|
|
*/
|
|
|
|
BitmapAndPath *
|
2005-06-06 00:32:58 +02:00
|
|
|
create_bitmap_and_path(PlannerInfo *root,
|
2005-04-21 21:18:13 +02:00
|
|
|
RelOptInfo *rel,
|
|
|
|
List *bitmapquals)
|
|
|
|
{
|
|
|
|
BitmapAndPath *pathnode = makeNode(BitmapAndPath);
|
|
|
|
|
|
|
|
pathnode->path.pathtype = T_BitmapAnd;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = NULL; /* not used in bitmap trees */
|
2016-01-20 20:29:22 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be
|
|
|
|
* parallel-safe if and only if rel->consider_parallel is set. So, we can
|
|
|
|
* set the flag for this path based only on the relation-level flag,
|
|
|
|
* without actually iterating over the list of children.
|
|
|
|
*/
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
|
|
|
|
2005-10-15 04:49:52 +02:00
|
|
|
pathnode->path.pathkeys = NIL; /* always unordered */
|
2005-04-21 21:18:13 +02:00
|
|
|
|
|
|
|
pathnode->bitmapquals = bitmapquals;
|
|
|
|
|
|
|
|
/* this sets bitmapselectivity as well as the regular cost fields: */
|
|
|
|
cost_bitmap_and_node(pathnode, root);
|
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* create_bitmap_or_path
|
|
|
|
* Creates a path node representing a BitmapOr.
|
|
|
|
*/
|
|
|
|
BitmapOrPath *
|
2005-06-06 00:32:58 +02:00
|
|
|
create_bitmap_or_path(PlannerInfo *root,
|
2005-04-21 21:18:13 +02:00
|
|
|
RelOptInfo *rel,
|
|
|
|
List *bitmapquals)
|
|
|
|
{
|
|
|
|
BitmapOrPath *pathnode = makeNode(BitmapOrPath);
|
|
|
|
|
|
|
|
pathnode->path.pathtype = T_BitmapOr;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = NULL; /* not used in bitmap trees */
|
2016-01-20 20:29:22 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Currently, a BitmapHeapPath, BitmapAndPath, or BitmapOrPath will be
|
|
|
|
* parallel-safe if and only if rel->consider_parallel is set. So, we can
|
|
|
|
* set the flag for this path based only on the relation-level flag,
|
|
|
|
* without actually iterating over the list of children.
|
|
|
|
*/
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
|
|
|
|
2005-10-15 04:49:52 +02:00
|
|
|
pathnode->path.pathkeys = NIL; /* always unordered */
|
2005-04-21 21:18:13 +02:00
|
|
|
|
|
|
|
pathnode->bitmapquals = bitmapquals;
|
|
|
|
|
|
|
|
/* this sets bitmapselectivity as well as the regular cost fields: */
|
|
|
|
cost_bitmap_or_node(pathnode, root);
|
2005-04-20 00:35:18 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
1999-11-23 21:07:06 +01:00
|
|
|
/*
|
|
|
|
* create_tidscan_path
|
2005-11-26 23:14:57 +01:00
|
|
|
* Creates a path corresponding to a scan by TID, returning the pathnode.
|
1999-11-23 21:07:06 +01:00
|
|
|
*/
|
2001-06-05 07:26:05 +02:00
|
|
|
TidPath *
|
2012-08-27 04:48:55 +02:00
|
|
|
create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
|
|
|
|
Relids required_outer)
|
1999-11-23 21:07:06 +01:00
|
|
|
{
|
2000-04-12 19:17:23 +02:00
|
|
|
TidPath *pathnode = makeNode(TidPath);
|
1999-11-23 21:07:06 +01:00
|
|
|
|
|
|
|
pathnode->path.pathtype = T_TidScan;
|
|
|
|
pathnode->path.parent = rel;
|
2012-08-27 04:48:55 +02:00
|
|
|
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.pathkeys = NIL; /* always unordered */
|
2004-01-05 06:07:36 +01:00
|
|
|
|
2005-11-26 23:14:57 +01:00
|
|
|
pathnode->tidquals = tidquals;
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2012-08-27 04:48:55 +02:00
|
|
|
cost_tidscan(&pathnode->path, root, rel, tidquals,
|
|
|
|
pathnode->path.param_info);
|
2000-02-15 21:49:31 +01:00
|
|
|
|
1999-11-23 21:07:06 +01:00
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2000-11-12 01:37:02 +01:00
|
|
|
/*
|
|
|
|
* create_append_path
|
|
|
|
* Creates a path corresponding to an Append plan, returning the
|
|
|
|
* pathnode.
|
2010-10-14 22:56:39 +02:00
|
|
|
*
|
|
|
|
* Note that we must handle subpaths = NIL, representing a dummy access path.
|
2000-11-12 01:37:02 +01:00
|
|
|
*/
|
|
|
|
AppendPath *
|
2016-01-20 20:29:22 +01:00
|
|
|
create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
|
|
|
|
int parallel_degree)
|
2000-11-12 01:37:02 +01:00
|
|
|
{
|
|
|
|
AppendPath *pathnode = makeNode(AppendPath);
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *l;
|
2000-11-12 01:37:02 +01:00
|
|
|
|
|
|
|
pathnode->path.pathtype = T_Append;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = get_appendrel_parampathinfo(rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = parallel_degree;
|
2001-03-22 05:01:46 +01:00
|
|
|
pathnode->path.pathkeys = NIL; /* result is always considered
|
|
|
|
* unsorted */
|
2000-11-12 01:37:02 +01:00
|
|
|
pathnode->subpaths = subpaths;
|
|
|
|
|
2010-10-14 22:56:39 +02:00
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* We don't bother with inventing a cost_append(), but just do it here.
|
|
|
|
*
|
|
|
|
* Compute rows and costs as sums of subplan rows and costs. We charge
|
|
|
|
* nothing extra for the Append itself, which perhaps is too optimistic,
|
|
|
|
* but since it doesn't do any selection or projection, it is a pretty
|
2014-05-06 18:12:18 +02:00
|
|
|
* cheap node. If you change this, see also make_append().
|
2010-10-14 22:56:39 +02:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
pathnode->path.rows = 0;
|
2000-11-12 01:37:02 +01:00
|
|
|
pathnode->path.startup_cost = 0;
|
|
|
|
pathnode->path.total_cost = 0;
|
|
|
|
foreach(l, subpaths)
|
|
|
|
{
|
2001-03-22 05:01:46 +01:00
|
|
|
Path *subpath = (Path *) lfirst(l);
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
pathnode->path.rows += subpath->rows;
|
|
|
|
|
2004-08-29 07:07:03 +02:00
|
|
|
if (l == list_head(subpaths)) /* first node? */
|
2000-11-12 01:37:02 +01:00
|
|
|
pathnode->path.startup_cost = subpath->startup_cost;
|
|
|
|
pathnode->path.total_cost += subpath->total_cost;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
|
|
|
|
subpath->parallel_safe;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
/* All child paths must have same parameterization */
|
|
|
|
Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
|
2000-11-12 01:37:02 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2010-10-14 22:56:39 +02:00
|
|
|
/*
|
|
|
|
* create_merge_append_path
|
|
|
|
* Creates a path corresponding to a MergeAppend plan, returning the
|
|
|
|
* pathnode.
|
|
|
|
*/
|
|
|
|
MergeAppendPath *
|
|
|
|
create_merge_append_path(PlannerInfo *root,
|
|
|
|
RelOptInfo *rel,
|
|
|
|
List *subpaths,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
List *pathkeys,
|
|
|
|
Relids required_outer)
|
2010-10-14 22:56:39 +02:00
|
|
|
{
|
|
|
|
MergeAppendPath *pathnode = makeNode(MergeAppendPath);
|
|
|
|
Cost input_startup_cost;
|
|
|
|
Cost input_total_cost;
|
|
|
|
ListCell *l;
|
|
|
|
|
|
|
|
pathnode->path.pathtype = T_MergeAppend;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = get_appendrel_parampathinfo(rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
2010-10-14 22:56:39 +02:00
|
|
|
pathnode->path.pathkeys = pathkeys;
|
|
|
|
pathnode->subpaths = subpaths;
|
|
|
|
|
2010-11-18 06:30:10 +01:00
|
|
|
/*
|
|
|
|
* Apply query-wide LIMIT if known and path is for sole base relation.
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* (Handling this at this low level is a bit klugy.)
|
2010-11-18 06:30:10 +01:00
|
|
|
*/
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if (bms_equal(rel->relids, root->all_baserels))
|
|
|
|
pathnode->limit_tuples = root->limit_tuples;
|
|
|
|
else
|
|
|
|
pathnode->limit_tuples = -1.0;
|
2010-11-18 06:30:10 +01:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* Add up the sizes and costs of the input paths.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
|
|
|
pathnode->path.rows = 0;
|
2010-10-14 22:56:39 +02:00
|
|
|
input_startup_cost = 0;
|
|
|
|
input_total_cost = 0;
|
|
|
|
foreach(l, subpaths)
|
|
|
|
{
|
|
|
|
Path *subpath = (Path *) lfirst(l);
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
pathnode->path.rows += subpath->rows;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
|
|
|
|
subpath->parallel_safe;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
2010-10-14 22:56:39 +02:00
|
|
|
if (pathkeys_contained_in(pathkeys, subpath->pathkeys))
|
|
|
|
{
|
|
|
|
/* Subpath is adequately ordered, we won't need to sort it */
|
|
|
|
input_startup_cost += subpath->startup_cost;
|
|
|
|
input_total_cost += subpath->total_cost;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* We'll need to insert a Sort node, so include cost for that */
|
2011-04-10 17:42:00 +02:00
|
|
|
Path sort_path; /* dummy for result of cost_sort */
|
2010-10-14 22:56:39 +02:00
|
|
|
|
|
|
|
cost_sort(&sort_path,
|
|
|
|
root,
|
|
|
|
pathkeys,
|
|
|
|
subpath->total_cost,
|
|
|
|
subpath->parent->tuples,
|
|
|
|
subpath->parent->width,
|
|
|
|
0.0,
|
|
|
|
work_mem,
|
2010-11-18 06:30:10 +01:00
|
|
|
pathnode->limit_tuples);
|
2010-10-14 22:56:39 +02:00
|
|
|
input_startup_cost += sort_path.startup_cost;
|
|
|
|
input_total_cost += sort_path.total_cost;
|
|
|
|
}
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
/* All child paths must have same parameterization */
|
|
|
|
Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
|
2010-10-14 22:56:39 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Now we can compute total costs of the MergeAppend */
|
|
|
|
cost_merge_append(&pathnode->path, root,
|
|
|
|
pathkeys, list_length(subpaths),
|
|
|
|
input_startup_cost, input_total_cost,
|
|
|
|
rel->tuples);
|
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2002-11-06 01:00:45 +01:00
|
|
|
/*
|
|
|
|
* create_result_path
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
* Creates a path representing a Result-and-nothing-else plan.
|
|
|
|
* This is only used for the case of a query with an empty jointree.
|
2002-11-06 01:00:45 +01:00
|
|
|
*/
|
|
|
|
ResultPath *
|
2016-01-20 20:29:22 +01:00
|
|
|
create_result_path(RelOptInfo *rel, List *quals)
|
2002-11-06 01:00:45 +01:00
|
|
|
{
|
|
|
|
ResultPath *pathnode = makeNode(ResultPath);
|
|
|
|
|
|
|
|
pathnode->path.pathtype = T_Result;
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
pathnode->path.parent = NULL;
|
2013-05-29 22:58:43 +02:00
|
|
|
pathnode->path.param_info = NULL; /* there are no other rels... */
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
pathnode->path.pathkeys = NIL;
|
|
|
|
pathnode->quals = quals;
|
2002-11-06 01:00:45 +01:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/* Hardly worth defining a cost_result() function ... just do it */
|
|
|
|
pathnode->path.rows = 1;
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
pathnode->path.startup_cost = 0;
|
|
|
|
pathnode->path.total_cost = cpu_tuple_cost;
|
2006-10-04 02:30:14 +02:00
|
|
|
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
/*
|
2006-10-04 02:30:14 +02:00
|
|
|
* In theory we should include the qual eval cost as well, but at present
|
|
|
|
* that doesn't accomplish much except duplicate work that will be done
|
|
|
|
* again in make_result; since this is only used for degenerate cases,
|
|
|
|
* nothing interesting will be done with the path cost values...
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
*/
|
2002-11-06 01:00:45 +01:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2002-11-30 06:21:03 +01:00
|
|
|
/*
|
|
|
|
* create_material_path
|
|
|
|
* Creates a path corresponding to a Material plan, returning the
|
|
|
|
* pathnode.
|
|
|
|
*/
|
|
|
|
MaterialPath *
|
|
|
|
create_material_path(RelOptInfo *rel, Path *subpath)
|
|
|
|
{
|
|
|
|
MaterialPath *pathnode = makeNode(MaterialPath);
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Assert(subpath->parent == rel);
|
|
|
|
|
2002-11-30 06:21:03 +01:00
|
|
|
pathnode->path.pathtype = T_Material;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = subpath->param_info;
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = subpath->parallel_safe;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
2002-11-30 06:21:03 +01:00
|
|
|
pathnode->path.pathkeys = subpath->pathkeys;
|
|
|
|
|
|
|
|
pathnode->subpath = subpath;
|
|
|
|
|
|
|
|
cost_material(&pathnode->path,
|
2009-09-13 00:12:09 +02:00
|
|
|
subpath->startup_cost,
|
2002-11-30 06:21:03 +01:00
|
|
|
subpath->total_cost,
|
2012-01-28 01:26:38 +01:00
|
|
|
subpath->rows,
|
2002-11-30 06:21:03 +01:00
|
|
|
rel->width);
|
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2003-01-20 19:55:07 +01:00
|
|
|
/*
|
|
|
|
* create_unique_path
|
|
|
|
* Creates a path representing elimination of distinct rows from the
|
2008-08-14 20:48:00 +02:00
|
|
|
* input data. Distinct-ness is defined according to the needs of the
|
|
|
|
* semijoin represented by sjinfo. If it is not possible to identify
|
|
|
|
* how to make the data unique, NULL is returned.
|
2003-01-20 19:55:07 +01:00
|
|
|
*
|
|
|
|
* If used at all, this is likely to be called repeatedly on the same rel;
|
|
|
|
* and the input subpath should always be the same (the cheapest_total path
|
|
|
|
* for the rel). So we cache the result.
|
|
|
|
*/
|
|
|
|
UniquePath *
|
2008-08-14 20:48:00 +02:00
|
|
|
create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
|
|
|
|
SpecialJoinInfo *sjinfo)
|
2003-01-20 19:55:07 +01:00
|
|
|
{
|
|
|
|
UniquePath *pathnode;
|
|
|
|
Path sort_path; /* dummy for result of cost_sort */
|
2003-01-22 01:07:00 +01:00
|
|
|
Path agg_path; /* dummy for result of cost_agg */
|
2003-01-20 19:55:07 +01:00
|
|
|
MemoryContext oldcontext;
|
|
|
|
int numCols;
|
|
|
|
|
2008-08-14 20:48:00 +02:00
|
|
|
/* Caller made a mistake if subpath isn't cheapest_total ... */
|
2003-01-20 19:55:07 +01:00
|
|
|
Assert(subpath == rel->cheapest_total_path);
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Assert(subpath->parent == rel);
|
2008-08-14 20:48:00 +02:00
|
|
|
/* ... or if SpecialJoinInfo is the wrong one */
|
|
|
|
Assert(sjinfo->jointype == JOIN_SEMI);
|
|
|
|
Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
|
2003-01-20 19:55:07 +01:00
|
|
|
|
|
|
|
/* If result already cached, return it */
|
|
|
|
if (rel->cheapest_unique_path)
|
|
|
|
return (UniquePath *) rel->cheapest_unique_path;
|
|
|
|
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
/* If it's not possible to unique-ify, return NULL */
|
|
|
|
if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
|
2008-08-14 20:48:00 +02:00
|
|
|
return NULL;
|
|
|
|
|
2003-01-20 19:55:07 +01:00
|
|
|
/*
|
2008-08-14 20:48:00 +02:00
|
|
|
* We must ensure path struct and subsidiary data are allocated in main
|
|
|
|
* planning context; otherwise GEQO memory management causes trouble.
|
2003-01-20 19:55:07 +01:00
|
|
|
*/
|
2007-01-20 21:45:41 +01:00
|
|
|
oldcontext = MemoryContextSwitchTo(root->planner_cxt);
|
2003-01-20 19:55:07 +01:00
|
|
|
|
2008-08-14 20:48:00 +02:00
|
|
|
pathnode = makeNode(UniquePath);
|
2003-01-20 19:55:07 +01:00
|
|
|
|
|
|
|
pathnode->path.pathtype = T_Unique;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = subpath->param_info;
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = subpath->parallel_safe;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
2003-01-20 19:55:07 +01:00
|
|
|
|
|
|
|
/*
|
2011-10-26 23:52:02 +02:00
|
|
|
* Assume the output is unsorted, since we don't necessarily have pathkeys
|
|
|
|
* to represent it. (This might get overridden below.)
|
2003-01-20 19:55:07 +01:00
|
|
|
*/
|
|
|
|
pathnode->path.pathkeys = NIL;
|
|
|
|
|
|
|
|
pathnode->subpath = subpath;
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
pathnode->in_operators = sjinfo->semi_operators;
|
|
|
|
pathnode->uniq_exprs = sjinfo->semi_rhs_exprs;
|
2005-07-15 19:09:26 +02:00
|
|
|
|
2011-10-26 23:52:02 +02:00
|
|
|
/*
|
|
|
|
* If the input is a relation and it has a unique index that proves the
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
* semi_rhs_exprs are unique, then we don't need to do anything. Note
|
|
|
|
* that relation_has_unique_index_for automatically considers restriction
|
2011-10-26 23:52:02 +02:00
|
|
|
* clauses for the rel, as well.
|
|
|
|
*/
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
|
2011-10-26 23:52:02 +02:00
|
|
|
relation_has_unique_index_for(root, rel, NIL,
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
sjinfo->semi_rhs_exprs,
|
|
|
|
sjinfo->semi_operators))
|
2011-10-26 23:52:02 +02:00
|
|
|
{
|
|
|
|
pathnode->umethod = UNIQUE_PATH_NOOP;
|
2012-01-28 01:26:38 +01:00
|
|
|
pathnode->path.rows = rel->rows;
|
2011-10-26 23:52:02 +02:00
|
|
|
pathnode->path.startup_cost = subpath->startup_cost;
|
|
|
|
pathnode->path.total_cost = subpath->total_cost;
|
|
|
|
pathnode->path.pathkeys = subpath->pathkeys;
|
|
|
|
|
|
|
|
rel->cheapest_unique_path = (Path *) pathnode;
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2005-07-15 19:09:26 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* If the input is a subquery whose output must be unique already, then we
|
|
|
|
* don't need to do anything. The test for uniqueness has to consider
|
|
|
|
* exactly which columns we are extracting; for example "SELECT DISTINCT
|
|
|
|
* x,y" doesn't guarantee that x alone is distinct. So we cannot check for
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
* this optimization unless semi_rhs_exprs consists only of simple Vars
|
2008-08-14 20:48:00 +02:00
|
|
|
* referencing subquery outputs. (Possibly we could do something with
|
|
|
|
* expressions in the subquery outputs, too, but for now keep it simple.)
|
2004-01-05 19:04:39 +01:00
|
|
|
*/
|
2008-08-14 20:48:00 +02:00
|
|
|
if (rel->rtekind == RTE_SUBQUERY)
|
2004-01-05 19:04:39 +01:00
|
|
|
{
|
2007-04-21 23:01:45 +02:00
|
|
|
RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
|
2005-07-15 19:09:26 +02:00
|
|
|
|
2014-07-16 03:12:43 +02:00
|
|
|
if (query_supports_distinctness(rte->subquery))
|
2004-01-05 19:04:39 +01:00
|
|
|
{
|
2014-07-16 03:12:43 +02:00
|
|
|
List *sub_tlist_colnos;
|
|
|
|
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
|
|
|
|
rel->relid);
|
2014-07-16 03:12:43 +02:00
|
|
|
|
|
|
|
if (sub_tlist_colnos &&
|
|
|
|
query_is_distinct_for(rte->subquery,
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
sub_tlist_colnos,
|
|
|
|
sjinfo->semi_operators))
|
2014-07-16 03:12:43 +02:00
|
|
|
{
|
|
|
|
pathnode->umethod = UNIQUE_PATH_NOOP;
|
|
|
|
pathnode->path.rows = rel->rows;
|
|
|
|
pathnode->path.startup_cost = subpath->startup_cost;
|
|
|
|
pathnode->path.total_cost = subpath->total_cost;
|
|
|
|
pathnode->path.pathkeys = subpath->pathkeys;
|
2004-01-05 19:04:39 +01:00
|
|
|
|
2014-07-16 03:12:43 +02:00
|
|
|
rel->cheapest_unique_path = (Path *) pathnode;
|
2004-01-05 19:04:39 +01:00
|
|
|
|
2014-07-16 03:12:43 +02:00
|
|
|
MemoryContextSwitchTo(oldcontext);
|
2008-08-14 20:48:00 +02:00
|
|
|
|
2014-07-16 03:12:43 +02:00
|
|
|
return pathnode;
|
|
|
|
}
|
2004-01-05 19:04:39 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-08-14 20:48:00 +02:00
|
|
|
/* Estimate number of output rows */
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
pathnode->path.rows = estimate_num_groups(root,
|
|
|
|
sjinfo->semi_rhs_exprs,
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
rel->rows,
|
|
|
|
NULL);
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
numCols = list_length(sjinfo->semi_rhs_exprs);
|
2003-01-20 19:55:07 +01:00
|
|
|
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
if (sjinfo->semi_can_btree)
|
2008-08-14 20:48:00 +02:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Estimate cost for sort+unique implementation
|
|
|
|
*/
|
|
|
|
cost_sort(&sort_path, root, NIL,
|
|
|
|
subpath->total_cost,
|
|
|
|
rel->rows,
|
|
|
|
rel->width,
|
2010-10-08 02:00:28 +02:00
|
|
|
0.0,
|
|
|
|
work_mem,
|
2008-08-14 20:48:00 +02:00
|
|
|
-1.0);
|
2003-08-04 02:43:34 +02:00
|
|
|
|
2008-08-14 20:48:00 +02:00
|
|
|
/*
|
2009-06-11 16:49:15 +02:00
|
|
|
* Charge one cpu_operator_cost per comparison per input tuple. We
|
|
|
|
* assume all columns get compared at most of the tuples. (XXX
|
2008-08-14 20:48:00 +02:00
|
|
|
* probably this is an overestimate.) This should agree with
|
|
|
|
* make_unique.
|
|
|
|
*/
|
|
|
|
sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
|
|
|
|
}
|
2003-01-20 19:55:07 +01:00
|
|
|
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
if (sjinfo->semi_can_hash)
|
2003-01-22 01:07:00 +01:00
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Estimate the overhead per hashtable entry at 64 bytes (same as in
|
|
|
|
* planner.c).
|
2003-01-22 01:07:00 +01:00
|
|
|
*/
|
2003-08-04 02:43:34 +02:00
|
|
|
int hashentrysize = rel->width + 64;
|
2003-01-20 19:55:07 +01:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
if (hashentrysize * pathnode->path.rows > work_mem * 1024L)
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We should not try to hash. Hack the SpecialJoinInfo to
|
|
|
|
* remember this, in case we come through here again.
|
|
|
|
*/
|
|
|
|
sjinfo->semi_can_hash = false;
|
|
|
|
}
|
2008-08-14 20:48:00 +02:00
|
|
|
else
|
2003-01-22 01:07:00 +01:00
|
|
|
cost_agg(&agg_path, root,
|
2011-04-24 22:55:20 +02:00
|
|
|
AGG_HASHED, NULL,
|
2012-01-28 01:26:38 +01:00
|
|
|
numCols, pathnode->path.rows,
|
2003-01-22 01:07:00 +01:00
|
|
|
subpath->startup_cost,
|
|
|
|
subpath->total_cost,
|
|
|
|
rel->rows);
|
|
|
|
}
|
|
|
|
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
|
2008-08-14 20:48:00 +02:00
|
|
|
{
|
|
|
|
if (agg_path.total_cost < sort_path.total_cost)
|
|
|
|
pathnode->umethod = UNIQUE_PATH_HASH;
|
|
|
|
else
|
|
|
|
pathnode->umethod = UNIQUE_PATH_SORT;
|
|
|
|
}
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
else if (sjinfo->semi_can_btree)
|
2008-08-14 20:48:00 +02:00
|
|
|
pathnode->umethod = UNIQUE_PATH_SORT;
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
else if (sjinfo->semi_can_hash)
|
2008-08-14 20:48:00 +02:00
|
|
|
pathnode->umethod = UNIQUE_PATH_HASH;
|
|
|
|
else
|
Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x. Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.
Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s). However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path(). That
shouldn't make any difference speed-wise and it's really a bit cleaner too.
The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels. That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly. In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-12 02:21:00 +01:00
|
|
|
{
|
|
|
|
/* we can get here only if we abandoned hashing above */
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
return NULL;
|
|
|
|
}
|
2008-08-14 20:48:00 +02:00
|
|
|
|
2004-01-05 19:04:39 +01:00
|
|
|
if (pathnode->umethod == UNIQUE_PATH_HASH)
|
2003-01-22 01:07:00 +01:00
|
|
|
{
|
|
|
|
pathnode->path.startup_cost = agg_path.startup_cost;
|
|
|
|
pathnode->path.total_cost = agg_path.total_cost;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
pathnode->path.startup_cost = sort_path.startup_cost;
|
|
|
|
pathnode->path.total_cost = sort_path.total_cost;
|
|
|
|
}
|
2003-01-20 19:55:07 +01:00
|
|
|
|
|
|
|
rel->cheapest_unique_path = (Path *) pathnode;
|
|
|
|
|
2008-08-14 20:48:00 +02:00
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
2003-01-20 19:55:07 +01:00
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
2015-10-01 01:23:36 +02:00
|
|
|
/*
|
|
|
|
* create_gather_path
|
|
|
|
*
|
|
|
|
* Creates a path corresponding to a gather scan, returning the
|
|
|
|
* pathnode.
|
|
|
|
*/
|
|
|
|
GatherPath *
|
|
|
|
create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
|
2016-01-20 20:29:22 +01:00
|
|
|
Relids required_outer)
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
2015-10-01 01:23:36 +02:00
|
|
|
{
|
|
|
|
GatherPath *pathnode = makeNode(GatherPath);
|
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
Assert(subpath->parallel_safe);
|
|
|
|
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
2015-10-01 01:23:36 +02:00
|
|
|
pathnode->path.pathtype = T_Gather;
|
|
|
|
pathnode->path.parent = rel;
|
|
|
|
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = false;
|
|
|
|
pathnode->path.parallel_degree = subpath->parallel_degree;
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
2015-10-01 01:23:36 +02:00
|
|
|
pathnode->path.pathkeys = NIL; /* Gather has unordered result */
|
|
|
|
|
|
|
|
pathnode->subpath = subpath;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->single_copy = false;
|
|
|
|
|
|
|
|
if (pathnode->path.parallel_degree == 0)
|
|
|
|
{
|
|
|
|
pathnode->path.parallel_degree = 1;
|
|
|
|
pathnode->path.pathkeys = subpath->pathkeys;
|
|
|
|
pathnode->single_copy = true;
|
|
|
|
}
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
2015-10-01 01:23:36 +02:00
|
|
|
|
|
|
|
cost_gather(pathnode, root, rel, pathnode->path.param_info);
|
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2004-01-19 04:49:41 +01:00
|
|
|
/*
|
2005-07-15 19:09:26 +02:00
|
|
|
* translate_sub_tlist - get subquery column numbers represented by tlist
|
|
|
|
*
|
Fix convert_IN_to_join to properly handle the case where the subselect's
output is not of the same type that's needed for the IN comparison (ie,
where the parser inserted an implicit coercion above the subselect result).
We should record the coerced expression, not just a raw Var referencing
the subselect output, as the quantity that needs to be unique-ified if
we choose to implement the IN as Unique followed by a plain join.
As of 8.3 this error was causing crashes, as seen in bug #4113 from Javier
Hernandez, because the executor was being told to hash or sort the raw
subselect output column using operators appropriate to the coerced type.
In prior versions there was no crash because the executor chose the
hash or sort operators for itself based on the column type it saw.
However, that's still not really right, because what's unique for one data
type might not be unique for another. In corner cases we could get multiple
outputs of a row that should appear only once, as demonstrated by the
regression test case included in this commit.
However, this patch doesn't apply cleanly to 8.2 or before, and the code
involved has shifted enough over time that I'm hesitant to try to back-patch.
Given the lack of complaints from the field about such corner cases, I think
the bug may not be important enough to risk breaking other things with a
back-patch.
2008-04-21 22:54:15 +02:00
|
|
|
* The given targetlist usually contains only Vars referencing the given relid.
|
2005-07-15 19:09:26 +02:00
|
|
|
* Extract their varattnos (ie, the column numbers of the subquery) and return
|
|
|
|
* as an integer List.
|
|
|
|
*
|
|
|
|
* If any of the tlist items is not a simple Var, we cannot determine whether
|
|
|
|
* the subquery's uniqueness condition (if any) matches ours, so punt and
|
|
|
|
* return NIL.
|
2004-01-19 04:49:41 +01:00
|
|
|
*/
|
2005-07-15 19:09:26 +02:00
|
|
|
static List *
|
|
|
|
translate_sub_tlist(List *tlist, int relid)
|
2004-01-19 04:49:41 +01:00
|
|
|
{
|
2005-07-15 19:09:26 +02:00
|
|
|
List *result = NIL;
|
|
|
|
ListCell *l;
|
2004-01-19 04:49:41 +01:00
|
|
|
|
2005-07-15 19:09:26 +02:00
|
|
|
foreach(l, tlist)
|
2004-01-19 04:49:41 +01:00
|
|
|
{
|
2005-10-15 04:49:52 +02:00
|
|
|
Var *var = (Var *) lfirst(l);
|
2004-01-19 04:49:41 +01:00
|
|
|
|
2005-07-15 19:09:26 +02:00
|
|
|
if (!var || !IsA(var, Var) ||
|
|
|
|
var->varno != relid)
|
|
|
|
return NIL; /* punt */
|
2004-01-19 04:49:41 +01:00
|
|
|
|
2005-07-15 19:09:26 +02:00
|
|
|
result = lappend_int(result, var->varattno);
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2000-09-29 20:21:41 +02:00
|
|
|
/*
|
|
|
|
* create_subqueryscan_path
|
|
|
|
* Creates a path corresponding to a sequential scan of a subquery,
|
|
|
|
* returning the pathnode.
|
|
|
|
*/
|
|
|
|
Path *
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
List *pathkeys, Relids required_outer)
|
2000-09-29 20:21:41 +02:00
|
|
|
{
|
|
|
|
Path *pathnode = makeNode(Path);
|
|
|
|
|
|
|
|
pathnode->pathtype = T_SubqueryScan;
|
|
|
|
pathnode->parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->parallel_degree = 0;
|
2003-02-15 21:12:41 +01:00
|
|
|
pathnode->pathkeys = pathkeys;
|
2000-09-29 20:21:41 +02:00
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
cost_subqueryscan(pathnode, root, rel, pathnode->param_info);
|
2000-09-29 20:21:41 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2002-05-12 22:10:05 +02:00
|
|
|
/*
|
|
|
|
* create_functionscan_path
|
|
|
|
* Creates a path corresponding to a sequential scan of a function,
|
|
|
|
* returning the pathnode.
|
|
|
|
*/
|
|
|
|
Path *
|
2012-08-08 01:02:54 +02:00
|
|
|
create_functionscan_path(PlannerInfo *root, RelOptInfo *rel,
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
List *pathkeys, Relids required_outer)
|
2002-05-12 22:10:05 +02:00
|
|
|
{
|
|
|
|
Path *pathnode = makeNode(Path);
|
|
|
|
|
|
|
|
pathnode->pathtype = T_FunctionScan;
|
|
|
|
pathnode->parent = rel;
|
2012-08-08 01:02:54 +02:00
|
|
|
pathnode->param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->parallel_degree = 0;
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
pathnode->pathkeys = pathkeys;
|
2002-05-12 22:10:05 +02:00
|
|
|
|
2012-08-08 01:02:54 +02:00
|
|
|
cost_functionscan(pathnode, root, rel, pathnode->param_info);
|
2002-05-12 22:10:05 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2006-08-02 03:59:48 +02:00
|
|
|
/*
|
|
|
|
* create_valuesscan_path
|
|
|
|
* Creates a path corresponding to a scan of a VALUES list,
|
|
|
|
* returning the pathnode.
|
|
|
|
*/
|
|
|
|
Path *
|
2012-08-12 22:01:26 +02:00
|
|
|
create_valuesscan_path(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
Relids required_outer)
|
2006-08-02 03:59:48 +02:00
|
|
|
{
|
|
|
|
Path *pathnode = makeNode(Path);
|
|
|
|
|
|
|
|
pathnode->pathtype = T_ValuesScan;
|
|
|
|
pathnode->parent = rel;
|
2012-08-12 22:01:26 +02:00
|
|
|
pathnode->param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->parallel_degree = 0;
|
2006-08-02 03:59:48 +02:00
|
|
|
pathnode->pathkeys = NIL; /* result is always unordered */
|
|
|
|
|
2012-08-12 22:01:26 +02:00
|
|
|
cost_valuesscan(pathnode, root, rel, pathnode->param_info);
|
2006-08-02 03:59:48 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2008-10-04 23:56:55 +02:00
|
|
|
/*
|
|
|
|
* create_ctescan_path
|
|
|
|
* Creates a path corresponding to a scan of a non-self-reference CTE,
|
|
|
|
* returning the pathnode.
|
|
|
|
*/
|
|
|
|
Path *
|
2012-08-27 04:48:55 +02:00
|
|
|
create_ctescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
|
2008-10-04 23:56:55 +02:00
|
|
|
{
|
|
|
|
Path *pathnode = makeNode(Path);
|
|
|
|
|
|
|
|
pathnode->pathtype = T_CteScan;
|
|
|
|
pathnode->parent = rel;
|
2012-08-27 04:48:55 +02:00
|
|
|
pathnode->param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->parallel_degree = 0;
|
2008-10-04 23:56:55 +02:00
|
|
|
pathnode->pathkeys = NIL; /* XXX for now, result is always unordered */
|
|
|
|
|
2012-08-27 04:48:55 +02:00
|
|
|
cost_ctescan(pathnode, root, rel, pathnode->param_info);
|
2008-10-04 23:56:55 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* create_worktablescan_path
|
|
|
|
* Creates a path corresponding to a scan of a self-reference CTE,
|
|
|
|
* returning the pathnode.
|
|
|
|
*/
|
|
|
|
Path *
|
2012-08-27 04:48:55 +02:00
|
|
|
create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
Relids required_outer)
|
2008-10-04 23:56:55 +02:00
|
|
|
{
|
|
|
|
Path *pathnode = makeNode(Path);
|
|
|
|
|
|
|
|
pathnode->pathtype = T_WorkTableScan;
|
|
|
|
pathnode->parent = rel;
|
2012-08-27 04:48:55 +02:00
|
|
|
pathnode->param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->parallel_degree = 0;
|
2008-10-04 23:56:55 +02:00
|
|
|
pathnode->pathkeys = NIL; /* result is always unordered */
|
|
|
|
|
|
|
|
/* Cost is the same as for a regular CTE scan */
|
2012-08-27 04:48:55 +02:00
|
|
|
cost_ctescan(pathnode, root, rel, pathnode->param_info);
|
2008-10-04 23:56:55 +02:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2011-02-20 06:17:18 +01:00
|
|
|
/*
|
|
|
|
* create_foreignscan_path
|
2015-09-29 13:42:30 +02:00
|
|
|
* Creates a path corresponding to a scan of a foreign table or
|
|
|
|
* a foreign join, returning the pathnode.
|
2012-03-05 22:15:59 +01:00
|
|
|
*
|
|
|
|
* This function is never called from core Postgres; rather, it's expected
|
2015-09-29 13:42:30 +02:00
|
|
|
* to be called by the GetForeignPaths or GetForeignJoinPaths function of
|
|
|
|
* a foreign data wrapper. We make the FDW supply all fields of the path,
|
|
|
|
* since we do not have any way to calculate them in core.
|
2011-02-20 06:17:18 +01:00
|
|
|
*/
|
|
|
|
ForeignPath *
|
2012-03-05 22:15:59 +01:00
|
|
|
create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
double rows, Cost startup_cost, Cost total_cost,
|
|
|
|
List *pathkeys,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Relids required_outer,
|
Allow foreign and custom joins to handle EvalPlanQual rechecks.
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join. To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.
To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method. Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).
KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
2015-12-08 18:31:03 +01:00
|
|
|
Path *fdw_outerpath,
|
2012-03-05 22:15:59 +01:00
|
|
|
List *fdw_private)
|
2011-02-20 06:17:18 +01:00
|
|
|
{
|
|
|
|
ForeignPath *pathnode = makeNode(ForeignPath);
|
|
|
|
|
|
|
|
pathnode->path.pathtype = T_ForeignScan;
|
|
|
|
pathnode->path.parent = rel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
|
|
|
|
required_outer);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = rel->consider_parallel;
|
|
|
|
pathnode->path.parallel_degree = 0;
|
2012-03-05 22:15:59 +01:00
|
|
|
pathnode->path.rows = rows;
|
|
|
|
pathnode->path.startup_cost = startup_cost;
|
|
|
|
pathnode->path.total_cost = total_cost;
|
|
|
|
pathnode->path.pathkeys = pathkeys;
|
2011-02-20 06:17:18 +01:00
|
|
|
|
Allow foreign and custom joins to handle EvalPlanQual rechecks.
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join. To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.
To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method. Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).
KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
2015-12-08 18:31:03 +01:00
|
|
|
pathnode->fdw_outerpath = fdw_outerpath;
|
2012-03-05 22:15:59 +01:00
|
|
|
pathnode->fdw_private = fdw_private;
|
2011-02-20 06:17:18 +01:00
|
|
|
|
|
|
|
return pathnode;
|
|
|
|
}
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* calc_nestloop_required_outer
|
|
|
|
* Compute the required_outer set for a nestloop join path
|
|
|
|
*
|
|
|
|
* Note: result must not share storage with either input
|
|
|
|
*/
|
|
|
|
Relids
|
|
|
|
calc_nestloop_required_outer(Path *outer_path, Path *inner_path)
|
|
|
|
{
|
2012-06-10 21:20:04 +02:00
|
|
|
Relids outer_paramrels = PATH_REQ_OUTER(outer_path);
|
|
|
|
Relids inner_paramrels = PATH_REQ_OUTER(inner_path);
|
|
|
|
Relids required_outer;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
|
|
|
/* inner_path can require rels from outer path, but not vice versa */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids));
|
2012-01-28 01:26:38 +01:00
|
|
|
/* easy case if inner path is not parameterized */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if (!inner_paramrels)
|
|
|
|
return bms_copy(outer_paramrels);
|
2012-01-28 01:26:38 +01:00
|
|
|
/* else, form the union ... */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
required_outer = bms_union(outer_paramrels, inner_paramrels);
|
2012-01-28 01:26:38 +01:00
|
|
|
/* ... and remove any mention of now-satisfied outer rels */
|
|
|
|
required_outer = bms_del_members(required_outer,
|
|
|
|
outer_path->parent->relids);
|
|
|
|
/* maintain invariant that required_outer is exactly NULL if empty */
|
|
|
|
if (bms_is_empty(required_outer))
|
|
|
|
{
|
|
|
|
bms_free(required_outer);
|
|
|
|
required_outer = NULL;
|
|
|
|
}
|
|
|
|
return required_outer;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* calc_non_nestloop_required_outer
|
|
|
|
* Compute the required_outer set for a merge or hash join path
|
|
|
|
*
|
|
|
|
* Note: result must not share storage with either input
|
|
|
|
*/
|
|
|
|
Relids
|
|
|
|
calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path)
|
|
|
|
{
|
2012-06-10 21:20:04 +02:00
|
|
|
Relids outer_paramrels = PATH_REQ_OUTER(outer_path);
|
|
|
|
Relids inner_paramrels = PATH_REQ_OUTER(inner_path);
|
|
|
|
Relids required_outer;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
|
|
|
/* neither path can require rels from the other */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids));
|
|
|
|
Assert(!bms_overlap(inner_paramrels, outer_path->parent->relids));
|
2012-01-28 01:26:38 +01:00
|
|
|
/* form the union ... */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
required_outer = bms_union(outer_paramrels, inner_paramrels);
|
2012-01-28 01:26:38 +01:00
|
|
|
/* we do not need an explicit test for empty; bms_union gets it right */
|
|
|
|
return required_outer;
|
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* create_nestloop_path
|
1997-09-07 07:04:48 +02:00
|
|
|
* Creates a pathnode corresponding to a nestloop join between two
|
|
|
|
* relations.
|
|
|
|
*
|
1996-07-09 08:22:35 +02:00
|
|
|
* 'joinrel' is the join relation.
|
2000-09-12 23:07:18 +02:00
|
|
|
* 'jointype' is the type of join required
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'workspace' is the result from initial_cost_nestloop
|
2008-08-14 20:48:00 +02:00
|
|
|
* 'sjinfo' is extra info about the join for selectivity estimation
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'semifactors' contains valid data if jointype is SEMI or ANTI
|
1999-08-16 04:17:58 +02:00
|
|
|
* 'outer_path' is the outer path
|
|
|
|
* 'inner_path' is the inner path
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'restrict_clauses' are the RestrictInfo nodes to apply at the join
|
1999-08-16 04:17:58 +02:00
|
|
|
* 'pathkeys' are the path keys of the new join path
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'required_outer' is the set of required outer rels
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
1996-07-09 08:22:35 +02:00
|
|
|
* Returns the resulting path node.
|
|
|
|
*/
|
2001-06-05 07:26:05 +02:00
|
|
|
NestPath *
|
2005-06-06 00:32:58 +02:00
|
|
|
create_nestloop_path(PlannerInfo *root,
|
2001-06-05 07:26:05 +02:00
|
|
|
RelOptInfo *joinrel,
|
2000-09-12 23:07:18 +02:00
|
|
|
JoinType jointype,
|
2012-01-28 01:26:38 +01:00
|
|
|
JoinCostWorkspace *workspace,
|
2008-08-14 20:48:00 +02:00
|
|
|
SpecialJoinInfo *sjinfo,
|
2012-01-28 01:26:38 +01:00
|
|
|
SemiAntiJoinFactors *semifactors,
|
1997-09-08 23:56:23 +02:00
|
|
|
Path *outer_path,
|
|
|
|
Path *inner_path,
|
2000-02-07 05:41:04 +01:00
|
|
|
List *restrict_clauses,
|
2012-01-28 01:26:38 +01:00
|
|
|
List *pathkeys,
|
|
|
|
Relids required_outer)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
1999-02-12 07:43:53 +01:00
|
|
|
NestPath *pathnode = makeNode(NestPath);
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Relids inner_req_outer = PATH_REQ_OUTER(inner_path);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
/*
|
|
|
|
* If the inner path is parameterized by the outer, we must drop any
|
2012-06-10 21:20:04 +02:00
|
|
|
* restrict_clauses that are due to be moved into the inner path. We have
|
|
|
|
* to do this now, rather than postpone the work till createplan time,
|
|
|
|
* because the restrict_clauses list can affect the size and cost
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* estimates for this path.
|
|
|
|
*/
|
|
|
|
if (bms_overlap(inner_req_outer, outer_path->parent->relids))
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Relids inner_and_outer = bms_union(inner_path->parent->relids,
|
|
|
|
inner_req_outer);
|
|
|
|
List *jclauses = NIL;
|
2012-01-28 01:26:38 +01:00
|
|
|
ListCell *lc;
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
foreach(lc, restrict_clauses)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if (!join_clause_is_movable_into(rinfo,
|
|
|
|
inner_path->parent->relids,
|
|
|
|
inner_and_outer))
|
2012-01-28 01:26:38 +01:00
|
|
|
jclauses = lappend(jclauses, rinfo);
|
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
restrict_clauses = jclauses;
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
|
|
|
pathnode->path.pathtype = T_NestLoop;
|
|
|
|
pathnode->path.parent = joinrel;
|
|
|
|
pathnode->path.param_info =
|
|
|
|
get_joinrel_parampathinfo(root,
|
|
|
|
joinrel,
|
|
|
|
outer_path,
|
|
|
|
inner_path,
|
|
|
|
sjinfo,
|
|
|
|
required_outer,
|
|
|
|
&restrict_clauses);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->path.parallel_safe = joinrel->consider_parallel &&
|
|
|
|
outer_path->parallel_safe && inner_path->parallel_safe;
|
|
|
|
/* This is a foolish way to estimate parallel_degree, but for now... */
|
|
|
|
pathnode->path.parallel_degree = outer_path->parallel_degree;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->path.pathkeys = pathkeys;
|
2000-09-12 23:07:18 +02:00
|
|
|
pathnode->jointype = jointype;
|
1997-09-07 07:04:48 +02:00
|
|
|
pathnode->outerjoinpath = outer_path;
|
|
|
|
pathnode->innerjoinpath = inner_path;
|
2000-02-07 05:41:04 +01:00
|
|
|
pathnode->joinrestrictinfo = restrict_clauses;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
final_cost_nestloop(root, pathnode, workspace, sjinfo, semifactors);
|
1999-07-27 05:51:11 +02:00
|
|
|
|
1998-09-01 05:29:17 +02:00
|
|
|
return pathnode;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* create_mergejoin_path
|
1998-08-04 18:44:31 +02:00
|
|
|
* Creates a pathnode corresponding to a mergejoin join between
|
1997-09-07 07:04:48 +02:00
|
|
|
* two relations
|
|
|
|
*
|
1996-07-09 08:22:35 +02:00
|
|
|
* 'joinrel' is the join relation
|
2000-09-12 23:07:18 +02:00
|
|
|
* 'jointype' is the type of join required
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'workspace' is the result from initial_cost_mergejoin
|
2008-08-14 20:48:00 +02:00
|
|
|
* 'sjinfo' is extra info about the join for selectivity estimation
|
1996-07-09 08:22:35 +02:00
|
|
|
* 'outer_path' is the outer path
|
|
|
|
* 'inner_path' is the inner path
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'restrict_clauses' are the RestrictInfo nodes to apply at the join
|
1999-08-16 04:17:58 +02:00
|
|
|
* 'pathkeys' are the path keys of the new join path
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'required_outer' is the set of required outer rels
|
2000-02-19 00:47:31 +01:00
|
|
|
* 'mergeclauses' are the RestrictInfo nodes to use as merge clauses
|
|
|
|
* (this should be a subset of the restrict_clauses list)
|
1996-07-09 08:22:35 +02:00
|
|
|
* 'outersortkeys' are the sort varkeys for the outer relation
|
|
|
|
* 'innersortkeys' are the sort varkeys for the inner relation
|
|
|
|
*/
|
2001-06-05 07:26:05 +02:00
|
|
|
MergePath *
|
2005-06-06 00:32:58 +02:00
|
|
|
create_mergejoin_path(PlannerInfo *root,
|
2001-06-05 07:26:05 +02:00
|
|
|
RelOptInfo *joinrel,
|
2000-09-12 23:07:18 +02:00
|
|
|
JoinType jointype,
|
2012-01-28 01:26:38 +01:00
|
|
|
JoinCostWorkspace *workspace,
|
2008-08-14 20:48:00 +02:00
|
|
|
SpecialJoinInfo *sjinfo,
|
1997-09-08 23:56:23 +02:00
|
|
|
Path *outer_path,
|
|
|
|
Path *inner_path,
|
2000-02-07 05:41:04 +01:00
|
|
|
List *restrict_clauses,
|
1999-02-10 04:52:54 +01:00
|
|
|
List *pathkeys,
|
2012-01-28 01:26:38 +01:00
|
|
|
Relids required_outer,
|
1997-09-08 23:56:23 +02:00
|
|
|
List *mergeclauses,
|
|
|
|
List *outersortkeys,
|
|
|
|
List *innersortkeys)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
MergePath *pathnode = makeNode(MergePath);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
pathnode->jpath.path.pathtype = T_MergeJoin;
|
|
|
|
pathnode->jpath.path.parent = joinrel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->jpath.path.param_info =
|
|
|
|
get_joinrel_parampathinfo(root,
|
|
|
|
joinrel,
|
|
|
|
outer_path,
|
|
|
|
inner_path,
|
|
|
|
sjinfo,
|
|
|
|
required_outer,
|
|
|
|
&restrict_clauses);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->jpath.path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
|
|
|
|
outer_path->parallel_safe && inner_path->parallel_safe;
|
|
|
|
pathnode->jpath.path.parallel_degree = 0;
|
2012-01-28 01:26:38 +01:00
|
|
|
pathnode->jpath.path.pathkeys = pathkeys;
|
2000-09-12 23:07:18 +02:00
|
|
|
pathnode->jpath.jointype = jointype;
|
1997-09-07 07:04:48 +02:00
|
|
|
pathnode->jpath.outerjoinpath = outer_path;
|
|
|
|
pathnode->jpath.innerjoinpath = inner_path;
|
2000-02-07 05:41:04 +01:00
|
|
|
pathnode->jpath.joinrestrictinfo = restrict_clauses;
|
1997-09-07 07:04:48 +02:00
|
|
|
pathnode->path_mergeclauses = mergeclauses;
|
|
|
|
pathnode->outersortkeys = outersortkeys;
|
|
|
|
pathnode->innersortkeys = innersortkeys;
|
2012-01-28 01:26:38 +01:00
|
|
|
/* pathnode->materialize_inner will be set by final_cost_mergejoin */
|
2000-02-15 21:49:31 +01:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
final_cost_mergejoin(root, pathnode, workspace, sjinfo);
|
1999-07-27 05:51:11 +02:00
|
|
|
|
1998-09-01 05:29:17 +02:00
|
|
|
return pathnode;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1999-08-06 06:00:17 +02:00
|
|
|
* create_hashjoin_path
|
1997-09-07 07:04:48 +02:00
|
|
|
* Creates a pathnode corresponding to a hash join between two relations.
|
|
|
|
*
|
1996-07-09 08:22:35 +02:00
|
|
|
* 'joinrel' is the join relation
|
2000-09-12 23:07:18 +02:00
|
|
|
* 'jointype' is the type of join required
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'workspace' is the result from initial_cost_hashjoin
|
2008-08-14 20:48:00 +02:00
|
|
|
* 'sjinfo' is extra info about the join for selectivity estimation
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'semifactors' contains valid data if jointype is SEMI or ANTI
|
1999-08-06 06:00:17 +02:00
|
|
|
* 'outer_path' is the cheapest outer path
|
|
|
|
* 'inner_path' is the cheapest inner path
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'restrict_clauses' are the RestrictInfo nodes to apply at the join
|
2012-01-28 01:26:38 +01:00
|
|
|
* 'required_outer' is the set of required outer rels
|
2002-11-30 01:08:22 +01:00
|
|
|
* 'hashclauses' are the RestrictInfo nodes to use as hash clauses
|
2000-02-19 00:47:31 +01:00
|
|
|
* (this should be a subset of the restrict_clauses list)
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2001-06-05 07:26:05 +02:00
|
|
|
HashPath *
|
2005-06-06 00:32:58 +02:00
|
|
|
create_hashjoin_path(PlannerInfo *root,
|
2001-06-05 07:26:05 +02:00
|
|
|
RelOptInfo *joinrel,
|
2000-09-12 23:07:18 +02:00
|
|
|
JoinType jointype,
|
2012-01-28 01:26:38 +01:00
|
|
|
JoinCostWorkspace *workspace,
|
2008-08-14 20:48:00 +02:00
|
|
|
SpecialJoinInfo *sjinfo,
|
2012-01-28 01:26:38 +01:00
|
|
|
SemiAntiJoinFactors *semifactors,
|
1997-09-08 23:56:23 +02:00
|
|
|
Path *outer_path,
|
|
|
|
Path *inner_path,
|
2000-02-07 05:41:04 +01:00
|
|
|
List *restrict_clauses,
|
2012-01-28 01:26:38 +01:00
|
|
|
Relids required_outer,
|
2001-06-05 07:26:05 +02:00
|
|
|
List *hashclauses)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
HashPath *pathnode = makeNode(HashPath);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
pathnode->jpath.path.pathtype = T_HashJoin;
|
|
|
|
pathnode->jpath.path.parent = joinrel;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathnode->jpath.path.param_info =
|
|
|
|
get_joinrel_parampathinfo(root,
|
|
|
|
joinrel,
|
|
|
|
outer_path,
|
|
|
|
inner_path,
|
|
|
|
sjinfo,
|
|
|
|
required_outer,
|
|
|
|
&restrict_clauses);
|
2015-11-11 14:57:52 +01:00
|
|
|
pathnode->jpath.path.parallel_aware = false;
|
2016-01-20 20:29:22 +01:00
|
|
|
pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
|
|
|
|
outer_path->parallel_safe && inner_path->parallel_safe;
|
|
|
|
/* This is a foolish way to estimate parallel_degree, but for now... */
|
|
|
|
pathnode->jpath.path.parallel_degree = outer_path->parallel_degree;
|
2009-06-11 16:49:15 +02:00
|
|
|
|
2009-03-26 18:15:35 +01:00
|
|
|
/*
|
|
|
|
* A hashjoin never has pathkeys, since its output ordering is
|
2014-05-06 18:12:18 +02:00
|
|
|
* unpredictable due to possible batching. XXX If the inner relation is
|
2009-03-26 18:15:35 +01:00
|
|
|
* small enough, we could instruct the executor that it must not batch,
|
|
|
|
* and then we could assume that the output inherits the outer relation's
|
2014-05-06 18:12:18 +02:00
|
|
|
* ordering, which might save a sort step. However there is considerable
|
2009-06-11 16:49:15 +02:00
|
|
|
* downside if our estimate of the inner relation size is badly off. For
|
|
|
|
* the moment we don't risk it. (Note also that if we wanted to take this
|
|
|
|
* seriously, joinpath.c would have to consider many more paths for the
|
|
|
|
* outer rel than it does now.)
|
2009-03-26 18:15:35 +01:00
|
|
|
*/
|
1999-08-16 04:17:58 +02:00
|
|
|
pathnode->jpath.path.pathkeys = NIL;
|
2012-01-28 01:26:38 +01:00
|
|
|
pathnode->jpath.jointype = jointype;
|
|
|
|
pathnode->jpath.outerjoinpath = outer_path;
|
|
|
|
pathnode->jpath.innerjoinpath = inner_path;
|
|
|
|
pathnode->jpath.joinrestrictinfo = restrict_clauses;
|
1997-09-07 07:04:48 +02:00
|
|
|
pathnode->path_hashclauses = hashclauses;
|
2012-01-28 01:26:38 +01:00
|
|
|
/* final_cost_hashjoin will fill in pathnode->num_batches */
|
2000-02-15 21:49:31 +01:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
final_cost_hashjoin(root, pathnode, workspace, sjinfo, semifactors);
|
1999-07-27 05:51:11 +02:00
|
|
|
|
1998-09-01 05:29:17 +02:00
|
|
|
return pathnode;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* reparameterize_path
|
|
|
|
* Attempt to modify a Path to have greater parameterization
|
|
|
|
*
|
|
|
|
* We use this to attempt to bring all child paths of an appendrel to the
|
|
|
|
* same parameterization level, ensuring that they all enforce the same set
|
|
|
|
* of join quals (and thus that that parameterization can be attributed to
|
|
|
|
* an append path built from such paths). Currently, only a few path types
|
2014-05-06 18:12:18 +02:00
|
|
|
* are supported here, though more could be added at need. We return NULL
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* if we can't reparameterize the given path.
|
|
|
|
*
|
|
|
|
* Note: we intentionally do not pass created paths to add_path(); it would
|
|
|
|
* possibly try to delete them on the grounds of being cost-inferior to the
|
|
|
|
* paths they were made from, and we don't want that. Paths made here are
|
|
|
|
* not necessarily of general-purpose usefulness, but they can be useful
|
|
|
|
* as members of an append path.
|
|
|
|
*/
|
|
|
|
Path *
|
|
|
|
reparameterize_path(PlannerInfo *root, Path *path,
|
|
|
|
Relids required_outer,
|
|
|
|
double loop_count)
|
|
|
|
{
|
|
|
|
RelOptInfo *rel = path->parent;
|
|
|
|
|
|
|
|
/* Can only increase, not decrease, path's parameterization */
|
|
|
|
if (!bms_is_subset(PATH_REQ_OUTER(path), required_outer))
|
|
|
|
return NULL;
|
|
|
|
switch (path->pathtype)
|
|
|
|
{
|
|
|
|
case T_SeqScan:
|
2015-11-11 14:57:52 +01:00
|
|
|
return create_seqscan_path(root, rel, required_outer, 0);
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
case T_SampleScan:
|
|
|
|
return (Path *) create_samplescan_path(root, rel, required_outer);
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
case T_IndexScan:
|
|
|
|
case T_IndexOnlyScan:
|
2012-06-10 21:20:04 +02:00
|
|
|
{
|
|
|
|
IndexPath *ipath = (IndexPath *) path;
|
|
|
|
IndexPath *newpath = makeNode(IndexPath);
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
2012-06-10 21:20:04 +02:00
|
|
|
/*
|
|
|
|
* We can't use create_index_path directly, and would not want
|
|
|
|
* to because it would re-compute the indexqual conditions
|
2014-05-06 18:12:18 +02:00
|
|
|
* which is wasted effort. Instead we hack things a bit:
|
2012-06-10 21:20:04 +02:00
|
|
|
* flat-copy the path node, revise its param_info, and redo
|
|
|
|
* the cost estimate.
|
|
|
|
*/
|
|
|
|
memcpy(newpath, ipath, sizeof(IndexPath));
|
|
|
|
newpath->path.param_info =
|
|
|
|
get_baserel_parampathinfo(root, rel, required_outer);
|
|
|
|
cost_index(newpath, root, loop_count);
|
|
|
|
return (Path *) newpath;
|
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
case T_BitmapHeapScan:
|
2012-06-10 21:20:04 +02:00
|
|
|
{
|
|
|
|
BitmapHeapPath *bpath = (BitmapHeapPath *) path;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
2012-06-10 21:20:04 +02:00
|
|
|
return (Path *) create_bitmap_heap_path(root,
|
|
|
|
rel,
|
|
|
|
bpath->bitmapqual,
|
|
|
|
required_outer,
|
|
|
|
loop_count);
|
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
case T_SubqueryScan:
|
|
|
|
return create_subqueryscan_path(root, rel, path->pathkeys,
|
|
|
|
required_outer);
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|