1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* relnode.c
|
2000-02-07 05:41:04 +01:00
|
|
|
* Relation-node lookup/construction routines
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2018-01-03 05:30:12 +01:00
|
|
|
* Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/optimizer/util/relnode.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
#include <limits.h>
|
|
|
|
|
2016-01-28 20:05:36 +01:00
|
|
|
#include "miscadmin.h"
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
#include "catalog/partition.h"
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
#include "optimizer/clauses.h"
|
2000-02-07 05:41:04 +01:00
|
|
|
#include "optimizer/cost.h"
|
1999-07-16 05:14:30 +02:00
|
|
|
#include "optimizer/pathnode.h"
|
2007-01-20 21:45:41 +01:00
|
|
|
#include "optimizer/paths.h"
|
2008-10-21 22:42:53 +02:00
|
|
|
#include "optimizer/placeholder.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
#include "optimizer/plancat.h"
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
#include "optimizer/prep.h"
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
#include "optimizer/restrictinfo.h"
|
2016-03-14 21:59:59 +01:00
|
|
|
#include "optimizer/tlist.h"
|
2005-06-09 01:02:05 +02:00
|
|
|
#include "utils/hsearch.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
|
|
|
|
|
2005-06-09 01:02:05 +02:00
|
|
|
typedef struct JoinHashEntry
|
|
|
|
{
|
|
|
|
Relids join_relids; /* hash key --- MUST BE FIRST */
|
|
|
|
RelOptInfo *join_rel;
|
|
|
|
} JoinHashEntry;
|
|
|
|
|
2005-06-06 06:13:36 +02:00
|
|
|
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
|
2005-10-15 04:49:52 +02:00
|
|
|
RelOptInfo *input_rel);
|
2005-06-06 00:32:58 +02:00
|
|
|
static List *build_joinrel_restrictlist(PlannerInfo *root,
|
2007-11-15 22:14:46 +01:00
|
|
|
RelOptInfo *joinrel,
|
|
|
|
RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel);
|
2000-02-07 05:41:04 +01:00
|
|
|
static void build_joinrel_joinlist(RelOptInfo *joinrel,
|
2000-04-12 19:17:23 +02:00
|
|
|
RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel);
|
2000-02-07 05:41:04 +01:00
|
|
|
static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
|
2007-01-20 21:45:41 +01:00
|
|
|
List *joininfo_list,
|
|
|
|
List *new_restrictlist);
|
|
|
|
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
|
|
|
|
List *joininfo_list,
|
|
|
|
List *new_joininfo);
|
2017-03-14 23:20:17 +01:00
|
|
|
static void set_foreign_rel_properties(RelOptInfo *joinrel,
|
|
|
|
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
|
|
|
|
static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
static void build_joinrel_partition_info(RelOptInfo *joinrel,
|
|
|
|
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
|
|
|
|
List *restrictlist, JoinType jointype);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2011-09-03 21:35:12 +02:00
|
|
|
/*
|
|
|
|
* setup_simple_rel_arrays
|
|
|
|
* Prepare the arrays we use for quickly accessing base relations.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
setup_simple_rel_arrays(PlannerInfo *root)
|
|
|
|
{
|
|
|
|
Index rti;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* Arrays are accessed using RT indexes (1..N) */
|
|
|
|
root->simple_rel_array_size = list_length(root->parse->rtable) + 1;
|
|
|
|
|
|
|
|
/* simple_rel_array is initialized to all NULLs */
|
|
|
|
root->simple_rel_array = (RelOptInfo **)
|
|
|
|
palloc0(root->simple_rel_array_size * sizeof(RelOptInfo *));
|
|
|
|
|
|
|
|
/* simple_rte_array is an array equivalent of the rtable list */
|
|
|
|
root->simple_rte_array = (RangeTblEntry **)
|
|
|
|
palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *));
|
|
|
|
rti = 1;
|
|
|
|
foreach(lc, root->parse->rtable)
|
|
|
|
{
|
|
|
|
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
|
|
|
|
|
|
|
|
root->simple_rte_array[rti++] = rte;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2006-01-31 22:39:25 +01:00
|
|
|
* build_simple_rel
|
|
|
|
* Construct a new RelOptInfo for a base relation or 'other' relation.
|
2001-05-20 22:28:20 +02:00
|
|
|
*/
|
|
|
|
RelOptInfo *
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
|
2001-05-20 22:28:20 +02:00
|
|
|
{
|
|
|
|
RelOptInfo *rel;
|
2006-01-31 22:39:25 +01:00
|
|
|
RangeTblEntry *rte;
|
2001-05-20 22:28:20 +02:00
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
/* Rel should not exist already */
|
2007-04-21 23:01:45 +02:00
|
|
|
Assert(relid > 0 && relid < root->simple_rel_array_size);
|
2006-01-31 22:39:25 +01:00
|
|
|
if (root->simple_rel_array[relid] != NULL)
|
|
|
|
elog(ERROR, "rel %d already exists", relid);
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2007-04-21 23:01:45 +02:00
|
|
|
/* Fetch RTE for relation */
|
|
|
|
rte = root->simple_rte_array[relid];
|
|
|
|
Assert(rte != NULL);
|
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
rel = makeNode(RelOptInfo);
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
rel->reloptkind = parent ? RELOPT_OTHER_MEMBER_REL : RELOPT_BASEREL;
|
2003-02-08 21:20:55 +01:00
|
|
|
rel->relids = bms_make_singleton(relid);
|
2000-02-07 05:41:04 +01:00
|
|
|
rel->rows = 0;
|
2012-09-02 00:16:24 +02:00
|
|
|
/* cheap startup cost is interesting iff not all tuples to be retrieved */
|
|
|
|
rel->consider_startup = (root->tuple_fraction > 0);
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
rel->consider_param_startup = false; /* might get changed later */
|
|
|
|
rel->consider_parallel = false; /* might get changed later */
|
2016-03-14 21:59:59 +01:00
|
|
|
rel->reltarget = create_empty_pathtarget();
|
2000-02-07 05:41:04 +01:00
|
|
|
rel->pathlist = NIL;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
rel->ppilist = NIL;
|
2016-01-20 20:29:22 +01:00
|
|
|
rel->partial_pathlist = NIL;
|
2000-02-15 21:49:31 +01:00
|
|
|
rel->cheapest_startup_path = NULL;
|
|
|
|
rel->cheapest_total_path = NULL;
|
2003-01-20 19:55:07 +01:00
|
|
|
rel->cheapest_unique_path = NULL;
|
2012-01-28 01:26:38 +01:00
|
|
|
rel->cheapest_parameterized_paths = NIL;
|
2015-12-11 21:52:16 +01:00
|
|
|
rel->direct_lateral_relids = NULL;
|
2015-12-08 00:56:14 +01:00
|
|
|
rel->lateral_relids = NULL;
|
2003-02-08 21:20:55 +01:00
|
|
|
rel->relid = relid;
|
2002-05-12 22:10:05 +02:00
|
|
|
rel->rtekind = rte->rtekind;
|
2003-06-30 01:05:05 +02:00
|
|
|
/* min_attr, max_attr, attr_needed, attr_widths are set below */
|
2012-08-27 04:48:55 +02:00
|
|
|
rel->lateral_vars = NIL;
|
2013-08-18 02:22:37 +02:00
|
|
|
rel->lateral_referencers = NULL;
|
2001-05-20 22:28:20 +02:00
|
|
|
rel->indexlist = NIL;
|
2017-04-08 04:20:03 +02:00
|
|
|
rel->statlist = NIL;
|
2000-02-07 05:41:04 +01:00
|
|
|
rel->pages = 0;
|
|
|
|
rel->tuples = 0;
|
2011-10-14 23:23:01 +02:00
|
|
|
rel->allvisfrac = 0;
|
2011-09-03 21:35:12 +02:00
|
|
|
rel->subroot = NULL;
|
Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
2012-09-05 18:54:03 +02:00
|
|
|
rel->subplan_params = NIL;
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
rel->rel_parallel_workers = -1; /* set up in get_relation_info */
|
Code review for foreign/custom join pushdown patch.
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments. Clean up
as follows:
* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function. In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs. Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.
* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.
* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that. Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.
* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries. The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks. It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway. I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.
* Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.
* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.
* Lots of cleanup of documentation and missed comments. Re-order some
code additions into more logical places.
2015-05-10 20:36:30 +02:00
|
|
|
rel->serverid = InvalidOid;
|
Avoid invalidating all foreign-join cached plans when user mappings change.
We must not push down a foreign join when the foreign tables involved
should be accessed under different user mappings. Previously we tried
to enforce that rule literally during planning, but that meant that the
resulting plans were dependent on the current contents of the
pg_user_mapping catalog, and we had to blow away all cached plans
containing any remote join when anything at all changed in pg_user_mapping.
This could have been improved somewhat, but the fact that a syscache inval
callback has very limited info about what changed made it hard to do better
within that design. Instead, let's change the planner to not consider user
mappings per se, but to allow a foreign join if both RTEs have the same
checkAsUser value. If they do, then they necessarily will use the same
user mapping at runtime, and we don't need to know specifically which one
that is. Post-plan-time changes in pg_user_mapping no longer require any
plan invalidation.
This rule does give up some optimization ability, to wit where two foreign
table references come from views with different owners or one's from a view
and one's directly in the query, but nonetheless the same user mapping
would have applied. We'll sacrifice the first case, but to not regress
more than we have to in the second case, allow a foreign join involving
both zero and nonzero checkAsUser values if the nonzero one is the same as
the prevailing effective userID. In that case, mark the plan as only
runnable by that userID.
The plancache code already had a notion of plans being userID-specific,
in order to support RLS. It was a little confused though, in particular
lacking clarity of thought as to whether it was the rewritten query or just
the finished plan that's dependent on the userID. Rearrange that code so
that it's clearer what depends on which, and so that the same logic applies
to both RLS-injected role dependency and foreign-join-injected role
dependency.
Note that this patch doesn't remove the other issue mentioned in the
original complaint, which is that while we'll reliably stop using a foreign
join if it's disallowed in a new context, we might fail to start using a
foreign join if it's now allowed, but we previously created a generic
cached plan that didn't use one. It was agreed that the chance of winning
that way was not high enough to justify the much larger number of plan
invalidations that would have to occur if we tried to cause it to happen.
In passing, clean up randomly-varying spelling of EXPLAIN commands in
postgres_fdw.sql, and fix a COSTS ON example that had been allowed to
leak into the committed tests.
This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the
previous attempt at ensuring we wouldn't push down foreign joins that
span permissions contexts.
Etsuro Fujita and Tom Lane
Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
2016-07-15 23:22:56 +02:00
|
|
|
rel->userid = rte->checkAsUser;
|
|
|
|
rel->useridiscurrent = false;
|
Revise FDW planning API, again.
Further reflection shows that a single callback isn't very workable if we
desire to let FDWs generate multiple Paths, because that forces the FDW to
do all work necessary to generate a valid Plan node for each Path. Instead
split the former PlanForeignScan API into three steps: GetForeignRelSize,
GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking
the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain,
and it's substantially more flexible for complex FDWs.
Add an fdw_private field to RelOptInfo so that the new functions can save
state there rather than possibly having to recalculate information two or
three times.
In addition, we'd not thought through what would be needed to allow an FDW
to set up subexpressions of its choice for runtime execution. We could
treat ForeignScan.fdw_private as an executable expression but that seems
likely to break existing FDWs unnecessarily (in particular, it would
restrict the set of node types allowable in fdw_private to those supported
by expression_tree_walker). Instead, invent a separate field fdw_exprs
which will receive the postprocessing appropriate for expression trees.
(One field is enough since it can be a list of expressions; also, we assume
the corresponding expression state tree(s) will be held within fdw_state,
so we don't need to add anything to ForeignScanState.)
Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this
further as we continue to work on that patch, but to me it feels a lot
closer to being right now.
2012-03-09 18:48:48 +01:00
|
|
|
rel->fdwroutine = NULL;
|
|
|
|
rel->fdw_private = NULL;
|
2017-04-08 04:20:03 +02:00
|
|
|
rel->unique_for_rels = NIL;
|
|
|
|
rel->non_unique_for_rels = NIL;
|
2000-02-07 05:41:04 +01:00
|
|
|
rel->baserestrictinfo = NIL;
|
2003-01-12 23:35:29 +01:00
|
|
|
rel->baserestrictcost.startup = 0;
|
|
|
|
rel->baserestrictcost.per_tuple = 0;
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
rel->baserestrict_min_security = UINT_MAX;
|
2000-02-07 05:41:04 +01:00
|
|
|
rel->joininfo = NIL;
|
2007-01-20 21:45:41 +01:00
|
|
|
rel->has_eclass_joins = false;
|
2017-09-21 05:33:04 +02:00
|
|
|
rel->part_scheme = NULL;
|
|
|
|
rel->nparts = 0;
|
|
|
|
rel->boundinfo = NULL;
|
|
|
|
rel->part_rels = NULL;
|
|
|
|
rel->partexprs = NULL;
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
rel->nullable_partexprs = NULL;
|
2000-02-07 05:41:04 +01:00
|
|
|
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
/*
|
|
|
|
* Pass top parent's relids down the inheritance hierarchy. If the parent
|
2017-05-17 22:31:56 +02:00
|
|
|
* has top_parent_relids set, it's a direct or an indirect child of the
|
|
|
|
* top parent indicated by top_parent_relids. By extension this child is
|
|
|
|
* also an indirect child of that parent.
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
*/
|
|
|
|
if (parent)
|
|
|
|
{
|
|
|
|
if (parent->top_parent_relids)
|
|
|
|
rel->top_parent_relids = parent->top_parent_relids;
|
|
|
|
else
|
|
|
|
rel->top_parent_relids = bms_copy(parent->relids);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
rel->top_parent_relids = NULL;
|
|
|
|
|
2002-03-12 01:52:10 +01:00
|
|
|
/* Check type of rtable entry */
|
|
|
|
switch (rte->rtekind)
|
2000-09-29 20:21:41 +02:00
|
|
|
{
|
2002-03-12 01:52:10 +01:00
|
|
|
case RTE_RELATION:
|
2003-02-03 16:07:08 +01:00
|
|
|
/* Table --- retrieve statistics from the system catalogs */
|
2006-09-20 00:49:53 +02:00
|
|
|
get_relation_info(root, rte->relid, rte->inh, rel);
|
2003-02-03 16:07:08 +01:00
|
|
|
break;
|
2002-03-12 01:52:10 +01:00
|
|
|
case RTE_SUBQUERY:
|
2002-05-12 22:10:05 +02:00
|
|
|
case RTE_FUNCTION:
|
2017-03-08 16:39:37 +01:00
|
|
|
case RTE_TABLEFUNC:
|
2006-08-02 03:59:48 +02:00
|
|
|
case RTE_VALUES:
|
2008-10-04 23:56:55 +02:00
|
|
|
case RTE_CTE:
|
2017-04-01 06:17:18 +02:00
|
|
|
case RTE_NAMEDTUPLESTORE:
|
2006-10-04 02:30:14 +02:00
|
|
|
|
2006-08-02 03:59:48 +02:00
|
|
|
/*
|
2017-09-06 16:41:05 +02:00
|
|
|
* Subquery, function, tablefunc, values list, CTE, or ENR --- set
|
|
|
|
* up attr range and arrays
|
2006-08-02 03:59:48 +02:00
|
|
|
*
|
|
|
|
* Note: 0 is included in range to support whole-row Vars
|
|
|
|
*/
|
2003-12-08 19:19:58 +01:00
|
|
|
rel->min_attr = 0;
|
2004-05-31 01:40:41 +02:00
|
|
|
rel->max_attr = list_length(rte->eref->colnames);
|
2004-12-01 20:00:56 +01:00
|
|
|
rel->attr_needed = (Relids *)
|
|
|
|
palloc0((rel->max_attr - rel->min_attr + 1) * sizeof(Relids));
|
|
|
|
rel->attr_widths = (int32 *)
|
|
|
|
palloc0((rel->max_attr - rel->min_attr + 1) * sizeof(int32));
|
2002-03-12 01:52:10 +01:00
|
|
|
break;
|
|
|
|
default:
|
2003-07-25 02:01:09 +02:00
|
|
|
elog(ERROR, "unrecognized RTE kind: %d",
|
2002-03-12 01:52:10 +01:00
|
|
|
(int) rte->rtekind);
|
|
|
|
break;
|
2000-09-29 20:21:41 +02:00
|
|
|
}
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
/* Save the finished struct in the query's simple_rel_array */
|
|
|
|
root->simple_rel_array[relid] = rel;
|
2005-06-06 06:13:36 +02:00
|
|
|
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
/*
|
|
|
|
* This is a convenient spot at which to note whether rels participating
|
|
|
|
* in the query have any securityQuals attached. If so, increase
|
|
|
|
* root->qual_security_level to ensure it's larger than the maximum
|
|
|
|
* security level needed for securityQuals.
|
|
|
|
*/
|
|
|
|
if (rte->securityQuals)
|
|
|
|
root->qual_security_level = Max(root->qual_security_level,
|
|
|
|
list_length(rte->securityQuals));
|
|
|
|
|
2006-09-20 00:49:53 +02:00
|
|
|
/*
|
|
|
|
* If this rel is an appendrel parent, recurse to build "other rel"
|
|
|
|
* RelOptInfos for its children. They are "other rels" because they are
|
|
|
|
* not in the main join tree, but we will need RelOptInfos to plan access
|
|
|
|
* to them.
|
|
|
|
*/
|
|
|
|
if (rte->inh)
|
|
|
|
{
|
|
|
|
ListCell *l;
|
2017-09-21 05:33:04 +02:00
|
|
|
int nparts = rel->nparts;
|
|
|
|
int cnt_parts = 0;
|
|
|
|
|
|
|
|
if (nparts > 0)
|
|
|
|
rel->part_rels = (RelOptInfo **)
|
|
|
|
palloc(sizeof(RelOptInfo *) * nparts);
|
2006-09-20 00:49:53 +02:00
|
|
|
|
|
|
|
foreach(l, root->append_rel_list)
|
|
|
|
{
|
|
|
|
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
|
2017-09-21 05:33:04 +02:00
|
|
|
RelOptInfo *childrel;
|
2006-09-20 00:49:53 +02:00
|
|
|
|
|
|
|
/* append_rel_list contains all append rels; ignore others */
|
|
|
|
if (appinfo->parent_relid != relid)
|
|
|
|
continue;
|
|
|
|
|
2017-09-21 05:33:04 +02:00
|
|
|
childrel = build_simple_rel(root, appinfo->child_relid,
|
|
|
|
rel);
|
|
|
|
|
|
|
|
/* Nothing more to do for an unpartitioned table. */
|
|
|
|
if (!rel->part_scheme)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The order of partition OIDs in append_rel_list is the same as
|
|
|
|
* the order in the PartitionDesc, so the order of part_rels will
|
|
|
|
* also match the PartitionDesc. See expand_partitioned_rtentry.
|
|
|
|
*/
|
|
|
|
Assert(cnt_parts < nparts);
|
|
|
|
rel->part_rels[cnt_parts] = childrel;
|
|
|
|
cnt_parts++;
|
2006-09-20 00:49:53 +02:00
|
|
|
}
|
2017-09-21 05:33:04 +02:00
|
|
|
|
|
|
|
/* We should have seen all the child partitions. */
|
|
|
|
Assert(cnt_parts == nparts);
|
2006-09-20 00:49:53 +02:00
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
return rel;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2001-05-20 22:28:20 +02:00
|
|
|
/*
|
|
|
|
* find_base_rel
|
2005-06-06 06:13:36 +02:00
|
|
|
* Find a base or other relation entry, which must already exist.
|
2001-05-20 22:28:20 +02:00
|
|
|
*/
|
|
|
|
RelOptInfo *
|
2005-06-06 00:32:58 +02:00
|
|
|
find_base_rel(PlannerInfo *root, int relid)
|
2001-05-20 22:28:20 +02:00
|
|
|
{
|
|
|
|
RelOptInfo *rel;
|
|
|
|
|
2005-06-06 06:13:36 +02:00
|
|
|
Assert(relid > 0);
|
2001-05-20 22:28:20 +02:00
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
if (relid < root->simple_rel_array_size)
|
2001-05-20 22:28:20 +02:00
|
|
|
{
|
2006-01-31 22:39:25 +01:00
|
|
|
rel = root->simple_rel_array[relid];
|
2005-06-06 06:13:36 +02:00
|
|
|
if (rel)
|
2001-05-20 22:28:20 +02:00
|
|
|
return rel;
|
|
|
|
}
|
|
|
|
|
2003-07-25 02:01:09 +02:00
|
|
|
elog(ERROR, "no relation entry for relid %d", relid);
|
2001-05-20 22:28:20 +02:00
|
|
|
|
|
|
|
return NULL; /* keep compiler quiet */
|
|
|
|
}
|
|
|
|
|
2005-06-09 01:02:05 +02:00
|
|
|
/*
|
|
|
|
* build_join_rel_hash
|
|
|
|
* Construct the auxiliary hash table for join relations.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
build_join_rel_hash(PlannerInfo *root)
|
|
|
|
{
|
|
|
|
HTAB *hashtab;
|
|
|
|
HASHCTL hash_ctl;
|
|
|
|
ListCell *l;
|
|
|
|
|
|
|
|
/* Create the hash table */
|
|
|
|
MemSet(&hash_ctl, 0, sizeof(hash_ctl));
|
|
|
|
hash_ctl.keysize = sizeof(Relids);
|
|
|
|
hash_ctl.entrysize = sizeof(JoinHashEntry);
|
|
|
|
hash_ctl.hash = bitmap_hash;
|
|
|
|
hash_ctl.match = bitmap_match;
|
|
|
|
hash_ctl.hcxt = CurrentMemoryContext;
|
|
|
|
hashtab = hash_create("JoinRelHashTable",
|
|
|
|
256L,
|
|
|
|
&hash_ctl,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
|
2005-06-09 01:02:05 +02:00
|
|
|
|
|
|
|
/* Insert all the already-existing joinrels */
|
|
|
|
foreach(l, root->join_rel_list)
|
|
|
|
{
|
|
|
|
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
|
|
|
|
JoinHashEntry *hentry;
|
|
|
|
bool found;
|
|
|
|
|
|
|
|
hentry = (JoinHashEntry *) hash_search(hashtab,
|
|
|
|
&(rel->relids),
|
|
|
|
HASH_ENTER,
|
|
|
|
&found);
|
|
|
|
Assert(!found);
|
|
|
|
hentry->join_rel = rel;
|
|
|
|
}
|
|
|
|
|
|
|
|
root->join_rel_hash = hashtab;
|
|
|
|
}
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
|
|
|
* find_join_rel
|
2003-02-08 21:20:55 +01:00
|
|
|
* Returns relation entry corresponding to 'relids' (a set of RT indexes),
|
2000-02-07 05:41:04 +01:00
|
|
|
* or NULL if none exists. This is for join relations.
|
|
|
|
*/
|
2004-02-17 01:52:53 +01:00
|
|
|
RelOptInfo *
|
2005-06-06 00:32:58 +02:00
|
|
|
find_join_rel(PlannerInfo *root, Relids relids)
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2005-06-09 01:02:05 +02:00
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* Switch to using hash lookup when list grows "too long". The threshold
|
2005-06-09 01:02:05 +02:00
|
|
|
* is arbitrary and is known only here.
|
|
|
|
*/
|
|
|
|
if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
|
|
|
|
build_join_rel_hash(root);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2005-06-09 01:02:05 +02:00
|
|
|
/*
|
|
|
|
* Use either hashtable lookup or linear search, as appropriate.
|
|
|
|
*
|
2005-11-22 19:17:34 +01:00
|
|
|
* Note: the seemingly redundant hashkey variable is used to avoid taking
|
|
|
|
* the address of relids; unless the compiler is exceedingly smart, doing
|
|
|
|
* so would force relids out of a register and thus probably slow down the
|
2005-10-15 04:49:52 +02:00
|
|
|
* list-search case.
|
2005-06-09 01:02:05 +02:00
|
|
|
*/
|
|
|
|
if (root->join_rel_hash)
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2005-06-09 01:02:05 +02:00
|
|
|
Relids hashkey = relids;
|
|
|
|
JoinHashEntry *hentry;
|
|
|
|
|
|
|
|
hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
|
|
|
|
&hashkey,
|
|
|
|
HASH_FIND,
|
|
|
|
NULL);
|
|
|
|
if (hentry)
|
|
|
|
return hentry->join_rel;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
ListCell *l;
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2005-06-09 01:02:05 +02:00
|
|
|
foreach(l, root->join_rel_list)
|
|
|
|
{
|
|
|
|
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
|
|
|
|
|
|
|
|
if (bms_equal(rel->relids, relids))
|
|
|
|
return rel;
|
|
|
|
}
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2017-03-14 23:20:17 +01:00
|
|
|
/*
|
|
|
|
* set_foreign_rel_properties
|
|
|
|
* Set up foreign-join fields if outer and inner relation are foreign
|
|
|
|
* tables (or joins) belonging to the same server and assigned to the same
|
|
|
|
* user to check access permissions as.
|
|
|
|
*
|
|
|
|
* In addition to an exact match of userid, we allow the case where one side
|
|
|
|
* has zero userid (implying current user) and the other side has explicit
|
|
|
|
* userid that happens to equal the current user; but in that case, pushdown of
|
|
|
|
* the join is only valid for the current user. The useridiscurrent field
|
|
|
|
* records whether we had to make such an assumption for this join or any
|
|
|
|
* sub-join.
|
|
|
|
*
|
|
|
|
* Otherwise these fields are left invalid, so GetForeignJoinPaths will not be
|
|
|
|
* called for the join relation.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel)
|
|
|
|
{
|
|
|
|
if (OidIsValid(outer_rel->serverid) &&
|
|
|
|
inner_rel->serverid == outer_rel->serverid)
|
|
|
|
{
|
|
|
|
if (inner_rel->userid == outer_rel->userid)
|
|
|
|
{
|
|
|
|
joinrel->serverid = outer_rel->serverid;
|
|
|
|
joinrel->userid = outer_rel->userid;
|
|
|
|
joinrel->useridiscurrent = outer_rel->useridiscurrent || inner_rel->useridiscurrent;
|
|
|
|
joinrel->fdwroutine = outer_rel->fdwroutine;
|
|
|
|
}
|
|
|
|
else if (!OidIsValid(inner_rel->userid) &&
|
|
|
|
outer_rel->userid == GetUserId())
|
|
|
|
{
|
|
|
|
joinrel->serverid = outer_rel->serverid;
|
|
|
|
joinrel->userid = outer_rel->userid;
|
|
|
|
joinrel->useridiscurrent = true;
|
|
|
|
joinrel->fdwroutine = outer_rel->fdwroutine;
|
|
|
|
}
|
|
|
|
else if (!OidIsValid(outer_rel->userid) &&
|
|
|
|
inner_rel->userid == GetUserId())
|
|
|
|
{
|
|
|
|
joinrel->serverid = outer_rel->serverid;
|
|
|
|
joinrel->userid = inner_rel->userid;
|
|
|
|
joinrel->useridiscurrent = true;
|
|
|
|
joinrel->fdwroutine = outer_rel->fdwroutine;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* add_join_rel
|
|
|
|
* Add given join relation to the list of join relations in the given
|
|
|
|
* PlannerInfo. Also add it to the auxiliary hashtable if there is one.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
|
|
|
|
{
|
|
|
|
/* GEQO requires us to append the new joinrel to the end of the list! */
|
|
|
|
root->join_rel_list = lappend(root->join_rel_list, joinrel);
|
|
|
|
|
|
|
|
/* store it into the auxiliary hashtable if there is one. */
|
|
|
|
if (root->join_rel_hash)
|
|
|
|
{
|
|
|
|
JoinHashEntry *hentry;
|
|
|
|
bool found;
|
|
|
|
|
|
|
|
hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
|
|
|
|
&(joinrel->relids),
|
|
|
|
HASH_ENTER,
|
|
|
|
&found);
|
|
|
|
Assert(!found);
|
|
|
|
hentry->join_rel = joinrel;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2001-05-20 22:28:20 +02:00
|
|
|
* build_join_rel
|
2000-02-07 05:41:04 +01:00
|
|
|
* Returns relation entry corresponding to the union of two given rels,
|
|
|
|
* creating a new relation entry if none already exists.
|
|
|
|
*
|
2003-02-08 21:20:55 +01:00
|
|
|
* 'joinrelids' is the Relids set that uniquely identifies the join
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'outer_rel' and 'inner_rel' are relation nodes for the relations to be
|
|
|
|
* joined
|
2008-08-14 20:48:00 +02:00
|
|
|
* 'sjinfo': join context info
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'restrictlist_ptr': result variable. If not NULL, *restrictlist_ptr
|
|
|
|
* receives the list of RestrictInfo nodes that apply to this
|
|
|
|
* particular pair of joinable relations.
|
|
|
|
*
|
|
|
|
* restrictlist_ptr makes the routine's API a little grotty, but it saves
|
|
|
|
* duplicated calculation of the restrictlist...
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
1998-07-18 06:22:52 +02:00
|
|
|
RelOptInfo *
|
2005-06-06 00:32:58 +02:00
|
|
|
build_join_rel(PlannerInfo *root,
|
2003-02-08 21:20:55 +01:00
|
|
|
Relids joinrelids,
|
2001-05-20 22:28:20 +02:00
|
|
|
RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel,
|
2008-08-14 20:48:00 +02:00
|
|
|
SpecialJoinInfo *sjinfo,
|
2001-05-20 22:28:20 +02:00
|
|
|
List **restrictlist_ptr)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2000-02-07 05:41:04 +01:00
|
|
|
RelOptInfo *joinrel;
|
|
|
|
List *restrictlist;
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/* This function should be used only for join between parents. */
|
|
|
|
Assert(!IS_OTHER_REL(outer_rel) && !IS_OTHER_REL(inner_rel));
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
|
|
|
* See if we already have a joinrel for this set of base rels.
|
|
|
|
*/
|
|
|
|
joinrel = find_join_rel(root, joinrelids);
|
|
|
|
|
|
|
|
if (joinrel)
|
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Yes, so we only need to figure the restrictlist for this particular
|
|
|
|
* pair of component relations.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
|
|
|
if (restrictlist_ptr)
|
2001-10-18 18:11:42 +02:00
|
|
|
*restrictlist_ptr = build_joinrel_restrictlist(root,
|
|
|
|
joinrel,
|
2000-02-07 05:41:04 +01:00
|
|
|
outer_rel,
|
2007-01-20 21:45:41 +01:00
|
|
|
inner_rel);
|
2000-02-07 05:41:04 +01:00
|
|
|
return joinrel;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Nope, so make one.
|
|
|
|
*/
|
|
|
|
joinrel = makeNode(RelOptInfo);
|
2002-03-12 01:52:10 +01:00
|
|
|
joinrel->reloptkind = RELOPT_JOINREL;
|
2003-02-08 21:20:55 +01:00
|
|
|
joinrel->relids = bms_copy(joinrelids);
|
2000-02-07 05:41:04 +01:00
|
|
|
joinrel->rows = 0;
|
2012-09-02 00:16:24 +02:00
|
|
|
/* cheap startup cost is interesting iff not all tuples to be retrieved */
|
|
|
|
joinrel->consider_startup = (root->tuple_fraction > 0);
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
joinrel->consider_param_startup = false;
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
joinrel->consider_parallel = false;
|
2016-03-14 21:59:59 +01:00
|
|
|
joinrel->reltarget = create_empty_pathtarget();
|
2000-02-07 05:41:04 +01:00
|
|
|
joinrel->pathlist = NIL;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
joinrel->ppilist = NIL;
|
2016-01-20 20:29:22 +01:00
|
|
|
joinrel->partial_pathlist = NIL;
|
2000-02-15 21:49:31 +01:00
|
|
|
joinrel->cheapest_startup_path = NULL;
|
|
|
|
joinrel->cheapest_total_path = NULL;
|
2003-01-20 19:55:07 +01:00
|
|
|
joinrel->cheapest_unique_path = NULL;
|
2012-01-28 01:26:38 +01:00
|
|
|
joinrel->cheapest_parameterized_paths = NIL;
|
2015-12-11 21:52:16 +01:00
|
|
|
/* init direct_lateral_relids from children; we'll finish it up below */
|
|
|
|
joinrel->direct_lateral_relids =
|
|
|
|
bms_union(outer_rel->direct_lateral_relids,
|
|
|
|
inner_rel->direct_lateral_relids);
|
Still more fixes for planner's handling of LATERAL references.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references. If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too. The planner did not understand this and would generate
intermediate joins that could not be used. While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.
To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent. This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.
Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations. This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.
Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1dd02f1. Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures. The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen. I chose to move the
supporting logic into joinrels.c where the latter functions are.
Back-patch to 9.3 where LATERAL support was introduced.
2015-12-11 20:22:20 +01:00
|
|
|
joinrel->lateral_relids = min_join_parameterization(root, joinrel->relids,
|
|
|
|
outer_rel, inner_rel);
|
2003-02-08 21:20:55 +01:00
|
|
|
joinrel->relid = 0; /* indicates not a baserel */
|
2002-05-12 22:10:05 +02:00
|
|
|
joinrel->rtekind = RTE_JOIN;
|
2003-06-30 01:05:05 +02:00
|
|
|
joinrel->min_attr = 0;
|
|
|
|
joinrel->max_attr = 0;
|
|
|
|
joinrel->attr_needed = NULL;
|
|
|
|
joinrel->attr_widths = NULL;
|
2012-08-27 04:48:55 +02:00
|
|
|
joinrel->lateral_vars = NIL;
|
2013-08-18 02:22:37 +02:00
|
|
|
joinrel->lateral_referencers = NULL;
|
2001-05-20 22:28:20 +02:00
|
|
|
joinrel->indexlist = NIL;
|
2017-04-08 04:20:03 +02:00
|
|
|
joinrel->statlist = NIL;
|
2000-02-07 05:41:04 +01:00
|
|
|
joinrel->pages = 0;
|
|
|
|
joinrel->tuples = 0;
|
2011-10-14 23:23:01 +02:00
|
|
|
joinrel->allvisfrac = 0;
|
2011-09-03 21:35:12 +02:00
|
|
|
joinrel->subroot = NULL;
|
Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
2012-09-05 18:54:03 +02:00
|
|
|
joinrel->subplan_params = NIL;
|
Avoid invalidating all foreign-join cached plans when user mappings change.
We must not push down a foreign join when the foreign tables involved
should be accessed under different user mappings. Previously we tried
to enforce that rule literally during planning, but that meant that the
resulting plans were dependent on the current contents of the
pg_user_mapping catalog, and we had to blow away all cached plans
containing any remote join when anything at all changed in pg_user_mapping.
This could have been improved somewhat, but the fact that a syscache inval
callback has very limited info about what changed made it hard to do better
within that design. Instead, let's change the planner to not consider user
mappings per se, but to allow a foreign join if both RTEs have the same
checkAsUser value. If they do, then they necessarily will use the same
user mapping at runtime, and we don't need to know specifically which one
that is. Post-plan-time changes in pg_user_mapping no longer require any
plan invalidation.
This rule does give up some optimization ability, to wit where two foreign
table references come from views with different owners or one's from a view
and one's directly in the query, but nonetheless the same user mapping
would have applied. We'll sacrifice the first case, but to not regress
more than we have to in the second case, allow a foreign join involving
both zero and nonzero checkAsUser values if the nonzero one is the same as
the prevailing effective userID. In that case, mark the plan as only
runnable by that userID.
The plancache code already had a notion of plans being userID-specific,
in order to support RLS. It was a little confused though, in particular
lacking clarity of thought as to whether it was the rewritten query or just
the finished plan that's dependent on the userID. Rearrange that code so
that it's clearer what depends on which, and so that the same logic applies
to both RLS-injected role dependency and foreign-join-injected role
dependency.
Note that this patch doesn't remove the other issue mentioned in the
original complaint, which is that while we'll reliably stop using a foreign
join if it's disallowed in a new context, we might fail to start using a
foreign join if it's now allowed, but we previously created a generic
cached plan that didn't use one. It was agreed that the chance of winning
that way was not high enough to justify the much larger number of plan
invalidations that would have to occur if we tried to cause it to happen.
In passing, clean up randomly-varying spelling of EXPLAIN commands in
postgres_fdw.sql, and fix a COSTS ON example that had been allowed to
leak into the committed tests.
This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the
previous attempt at ensuring we wouldn't push down foreign joins that
span permissions contexts.
Etsuro Fujita and Tom Lane
Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
2016-07-15 23:22:56 +02:00
|
|
|
joinrel->rel_parallel_workers = -1;
|
Code review for foreign/custom join pushdown patch.
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments. Clean up
as follows:
* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function. In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs. Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.
* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.
* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that. Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.
* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries. The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks. It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway. I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.
* Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.
* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.
* Lots of cleanup of documentation and missed comments. Re-order some
code additions into more logical places.
2015-05-10 20:36:30 +02:00
|
|
|
joinrel->serverid = InvalidOid;
|
Avoid invalidating all foreign-join cached plans when user mappings change.
We must not push down a foreign join when the foreign tables involved
should be accessed under different user mappings. Previously we tried
to enforce that rule literally during planning, but that meant that the
resulting plans were dependent on the current contents of the
pg_user_mapping catalog, and we had to blow away all cached plans
containing any remote join when anything at all changed in pg_user_mapping.
This could have been improved somewhat, but the fact that a syscache inval
callback has very limited info about what changed made it hard to do better
within that design. Instead, let's change the planner to not consider user
mappings per se, but to allow a foreign join if both RTEs have the same
checkAsUser value. If they do, then they necessarily will use the same
user mapping at runtime, and we don't need to know specifically which one
that is. Post-plan-time changes in pg_user_mapping no longer require any
plan invalidation.
This rule does give up some optimization ability, to wit where two foreign
table references come from views with different owners or one's from a view
and one's directly in the query, but nonetheless the same user mapping
would have applied. We'll sacrifice the first case, but to not regress
more than we have to in the second case, allow a foreign join involving
both zero and nonzero checkAsUser values if the nonzero one is the same as
the prevailing effective userID. In that case, mark the plan as only
runnable by that userID.
The plancache code already had a notion of plans being userID-specific,
in order to support RLS. It was a little confused though, in particular
lacking clarity of thought as to whether it was the rewritten query or just
the finished plan that's dependent on the userID. Rearrange that code so
that it's clearer what depends on which, and so that the same logic applies
to both RLS-injected role dependency and foreign-join-injected role
dependency.
Note that this patch doesn't remove the other issue mentioned in the
original complaint, which is that while we'll reliably stop using a foreign
join if it's disallowed in a new context, we might fail to start using a
foreign join if it's now allowed, but we previously created a generic
cached plan that didn't use one. It was agreed that the chance of winning
that way was not high enough to justify the much larger number of plan
invalidations that would have to occur if we tried to cause it to happen.
In passing, clean up randomly-varying spelling of EXPLAIN commands in
postgres_fdw.sql, and fix a COSTS ON example that had been allowed to
leak into the committed tests.
This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the
previous attempt at ensuring we wouldn't push down foreign joins that
span permissions contexts.
Etsuro Fujita and Tom Lane
Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
2016-07-15 23:22:56 +02:00
|
|
|
joinrel->userid = InvalidOid;
|
|
|
|
joinrel->useridiscurrent = false;
|
Revise FDW planning API, again.
Further reflection shows that a single callback isn't very workable if we
desire to let FDWs generate multiple Paths, because that forces the FDW to
do all work necessary to generate a valid Plan node for each Path. Instead
split the former PlanForeignScan API into three steps: GetForeignRelSize,
GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking
the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain,
and it's substantially more flexible for complex FDWs.
Add an fdw_private field to RelOptInfo so that the new functions can save
state there rather than possibly having to recalculate information two or
three times.
In addition, we'd not thought through what would be needed to allow an FDW
to set up subexpressions of its choice for runtime execution. We could
treat ForeignScan.fdw_private as an executable expression but that seems
likely to break existing FDWs unnecessarily (in particular, it would
restrict the set of node types allowable in fdw_private to those supported
by expression_tree_walker). Instead, invent a separate field fdw_exprs
which will receive the postprocessing appropriate for expression trees.
(One field is enough since it can be a list of expressions; also, we assume
the corresponding expression state tree(s) will be held within fdw_state,
so we don't need to add anything to ForeignScanState.)
Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this
further as we continue to work on that patch, but to me it feels a lot
closer to being right now.
2012-03-09 18:48:48 +01:00
|
|
|
joinrel->fdwroutine = NULL;
|
|
|
|
joinrel->fdw_private = NULL;
|
2017-04-08 04:20:03 +02:00
|
|
|
joinrel->unique_for_rels = NIL;
|
|
|
|
joinrel->non_unique_for_rels = NIL;
|
2000-02-07 05:41:04 +01:00
|
|
|
joinrel->baserestrictinfo = NIL;
|
2003-01-12 23:35:29 +01:00
|
|
|
joinrel->baserestrictcost.startup = 0;
|
|
|
|
joinrel->baserestrictcost.per_tuple = 0;
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
joinrel->baserestrict_min_security = UINT_MAX;
|
2000-02-07 05:41:04 +01:00
|
|
|
joinrel->joininfo = NIL;
|
2007-01-20 21:45:41 +01:00
|
|
|
joinrel->has_eclass_joins = false;
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
joinrel->top_parent_relids = NULL;
|
2017-09-21 05:33:04 +02:00
|
|
|
joinrel->part_scheme = NULL;
|
|
|
|
joinrel->nparts = 0;
|
|
|
|
joinrel->boundinfo = NULL;
|
|
|
|
joinrel->part_rels = NULL;
|
|
|
|
joinrel->partexprs = NULL;
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
joinrel->nullable_partexprs = NULL;
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2017-03-14 23:20:17 +01:00
|
|
|
/* Compute information relevant to the foreign relations. */
|
|
|
|
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
|
Code review for foreign/custom join pushdown patch.
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments. Clean up
as follows:
* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function. In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs. Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.
* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.
* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that. Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.
* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries. The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks. It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway. I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.
* Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.
* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.
* Lots of cleanup of documentation and missed comments. Re-order some
code additions into more logical places.
2015-05-10 20:36:30 +02:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Create a new tlist containing just the vars that need to be output from
|
|
|
|
* this join (ie, are needed for higher joinclauses or final output).
|
2005-06-06 06:13:36 +02:00
|
|
|
*
|
2005-10-15 04:49:52 +02:00
|
|
|
* NOTE: the tlist order for a join rel will depend on which pair of outer
|
|
|
|
* and inner rels we first try to build it from. But the contents should
|
|
|
|
* be the same regardless.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2005-06-06 06:13:36 +02:00
|
|
|
build_joinrel_tlist(root, joinrel, outer_rel);
|
|
|
|
build_joinrel_tlist(root, joinrel, inner_rel);
|
Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-19 02:01:49 +01:00
|
|
|
add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2015-12-11 21:52:16 +01:00
|
|
|
/*
|
|
|
|
* add_placeholders_to_joinrel also took care of adding the ph_lateral
|
|
|
|
* sets of any PlaceHolderVars computed here to direct_lateral_relids, so
|
|
|
|
* now we can finish computing that. This is much like the computation of
|
|
|
|
* the transitively-closed lateral_relids in min_join_parameterization,
|
|
|
|
* except that here we *do* have to consider the added PHVs.
|
|
|
|
*/
|
|
|
|
joinrel->direct_lateral_relids =
|
|
|
|
bms_del_members(joinrel->direct_lateral_relids, joinrel->relids);
|
|
|
|
if (bms_is_empty(joinrel->direct_lateral_relids))
|
|
|
|
joinrel->direct_lateral_relids = NULL;
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2000-04-12 19:17:23 +02:00
|
|
|
* Construct restrict and join clause lists for the new joinrel. (The
|
2005-10-15 04:49:52 +02:00
|
|
|
* caller might or might not need the restrictlist, but I need it anyway
|
|
|
|
* for set_joinrel_size_estimates().)
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2007-01-20 21:45:41 +01:00
|
|
|
restrictlist = build_joinrel_restrictlist(root, joinrel,
|
|
|
|
outer_rel, inner_rel);
|
2000-02-07 05:41:04 +01:00
|
|
|
if (restrictlist_ptr)
|
|
|
|
*restrictlist_ptr = restrictlist;
|
|
|
|
build_joinrel_joinlist(joinrel, outer_rel, inner_rel);
|
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
/*
|
|
|
|
* This is also the right place to check whether the joinrel has any
|
|
|
|
* pending EquivalenceClass joins.
|
|
|
|
*/
|
|
|
|
joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/* Store the partition information. */
|
|
|
|
build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
|
|
|
|
sjinfo->jointype);
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
|
|
|
* Set estimates of the joinrel's size.
|
|
|
|
*/
|
|
|
|
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
|
2008-08-14 20:48:00 +02:00
|
|
|
sjinfo, restrictlist);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
/*
|
|
|
|
* Set the consider_parallel flag if this joinrel could potentially be
|
|
|
|
* scanned within a parallel worker. If this flag is false for either
|
|
|
|
* inner_rel or outer_rel, then it must be false for the joinrel also.
|
2016-06-09 18:40:23 +02:00
|
|
|
* Even if both are true, there might be parallel-restricted expressions
|
|
|
|
* in the targetlist or quals.
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
*
|
2015-12-08 00:56:14 +01:00
|
|
|
* Note that if there are more than two rels in this relation, they could
|
2016-03-15 23:06:11 +01:00
|
|
|
* be divided between inner_rel and outer_rel in any arbitrary way. We
|
2015-12-08 00:56:14 +01:00
|
|
|
* assume this doesn't matter, because we should hit all the same baserels
|
|
|
|
* and joinclauses while building up to this joinrel no matter which we
|
|
|
|
* take; therefore, we should make the same decision here however we get
|
|
|
|
* here.
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
*/
|
|
|
|
if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
|
2016-08-19 20:03:07 +02:00
|
|
|
is_parallel_safe(root, (Node *) restrictlist) &&
|
|
|
|
is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
joinrel->consider_parallel = true;
|
|
|
|
|
2017-03-14 23:20:17 +01:00
|
|
|
/* Add the joinrel to the PlannerInfo. */
|
|
|
|
add_join_rel(root, joinrel);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2009-11-28 01:46:19 +01:00
|
|
|
/*
|
|
|
|
* Also, if dynamic-programming join search is active, add the new joinrel
|
2014-05-06 18:12:18 +02:00
|
|
|
* to the appropriate sublist. Note: you might think the Assert on number
|
2010-02-26 03:01:40 +01:00
|
|
|
* of members should be for equality, but some of the level 1 rels might
|
|
|
|
* have been joinrels already, so we can only assert <=.
|
2009-11-28 01:46:19 +01:00
|
|
|
*/
|
|
|
|
if (root->join_rel_level)
|
|
|
|
{
|
|
|
|
Assert(root->join_cur_level > 0);
|
|
|
|
Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
|
|
|
|
root->join_rel_level[root->join_cur_level] =
|
|
|
|
lappend(root->join_rel_level[root->join_cur_level], joinrel);
|
|
|
|
}
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
return joinrel;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/*
|
|
|
|
* build_child_join_rel
|
|
|
|
* Builds RelOptInfo representing join between given two child relations.
|
|
|
|
*
|
|
|
|
* 'outer_rel' and 'inner_rel' are the RelOptInfos of child relations being
|
|
|
|
* joined
|
|
|
|
* 'parent_joinrel' is the RelOptInfo representing the join between parent
|
|
|
|
* relations. Some of the members of new RelOptInfo are produced by
|
|
|
|
* translating corresponding members of this RelOptInfo
|
|
|
|
* 'sjinfo': child-join context info
|
|
|
|
* 'restrictlist': list of RestrictInfo nodes that apply to this particular
|
|
|
|
* pair of joinable relations
|
2017-11-10 16:55:09 +01:00
|
|
|
* 'jointype' is the join type (inner, left, full, etc)
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
*/
|
|
|
|
RelOptInfo *
|
|
|
|
build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel, RelOptInfo *parent_joinrel,
|
|
|
|
List *restrictlist, SpecialJoinInfo *sjinfo,
|
|
|
|
JoinType jointype)
|
|
|
|
{
|
|
|
|
RelOptInfo *joinrel = makeNode(RelOptInfo);
|
|
|
|
AppendRelInfo **appinfos;
|
|
|
|
int nappinfos;
|
|
|
|
|
|
|
|
/* Only joins between "other" relations land here. */
|
|
|
|
Assert(IS_OTHER_REL(outer_rel) && IS_OTHER_REL(inner_rel));
|
|
|
|
|
|
|
|
joinrel->reloptkind = RELOPT_OTHER_JOINREL;
|
|
|
|
joinrel->relids = bms_union(outer_rel->relids, inner_rel->relids);
|
|
|
|
joinrel->rows = 0;
|
|
|
|
/* cheap startup cost is interesting iff not all tuples to be retrieved */
|
|
|
|
joinrel->consider_startup = (root->tuple_fraction > 0);
|
|
|
|
joinrel->consider_param_startup = false;
|
|
|
|
joinrel->consider_parallel = false;
|
|
|
|
joinrel->reltarget = create_empty_pathtarget();
|
|
|
|
joinrel->pathlist = NIL;
|
|
|
|
joinrel->ppilist = NIL;
|
|
|
|
joinrel->partial_pathlist = NIL;
|
|
|
|
joinrel->cheapest_startup_path = NULL;
|
|
|
|
joinrel->cheapest_total_path = NULL;
|
|
|
|
joinrel->cheapest_unique_path = NULL;
|
|
|
|
joinrel->cheapest_parameterized_paths = NIL;
|
|
|
|
joinrel->direct_lateral_relids = NULL;
|
|
|
|
joinrel->lateral_relids = NULL;
|
|
|
|
joinrel->relid = 0; /* indicates not a baserel */
|
|
|
|
joinrel->rtekind = RTE_JOIN;
|
|
|
|
joinrel->min_attr = 0;
|
|
|
|
joinrel->max_attr = 0;
|
|
|
|
joinrel->attr_needed = NULL;
|
|
|
|
joinrel->attr_widths = NULL;
|
|
|
|
joinrel->lateral_vars = NIL;
|
|
|
|
joinrel->lateral_referencers = NULL;
|
|
|
|
joinrel->indexlist = NIL;
|
|
|
|
joinrel->pages = 0;
|
|
|
|
joinrel->tuples = 0;
|
|
|
|
joinrel->allvisfrac = 0;
|
|
|
|
joinrel->subroot = NULL;
|
|
|
|
joinrel->subplan_params = NIL;
|
|
|
|
joinrel->serverid = InvalidOid;
|
|
|
|
joinrel->userid = InvalidOid;
|
|
|
|
joinrel->useridiscurrent = false;
|
|
|
|
joinrel->fdwroutine = NULL;
|
|
|
|
joinrel->fdw_private = NULL;
|
|
|
|
joinrel->baserestrictinfo = NIL;
|
|
|
|
joinrel->baserestrictcost.startup = 0;
|
|
|
|
joinrel->baserestrictcost.per_tuple = 0;
|
|
|
|
joinrel->joininfo = NIL;
|
|
|
|
joinrel->has_eclass_joins = false;
|
|
|
|
joinrel->top_parent_relids = NULL;
|
|
|
|
joinrel->part_scheme = NULL;
|
|
|
|
joinrel->part_rels = NULL;
|
|
|
|
joinrel->partexprs = NULL;
|
|
|
|
joinrel->nullable_partexprs = NULL;
|
|
|
|
|
|
|
|
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
|
|
|
|
inner_rel->top_parent_relids);
|
|
|
|
|
|
|
|
/* Compute information relevant to foreign relations. */
|
|
|
|
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
|
|
|
|
|
|
|
|
/* Build targetlist */
|
|
|
|
build_joinrel_tlist(root, joinrel, outer_rel);
|
|
|
|
build_joinrel_tlist(root, joinrel, inner_rel);
|
|
|
|
/* Add placeholder variables. */
|
|
|
|
add_placeholders_to_child_joinrel(root, joinrel, parent_joinrel);
|
|
|
|
|
|
|
|
/* Construct joininfo list. */
|
|
|
|
appinfos = find_appinfos_by_relids(root, joinrel->relids, &nappinfos);
|
|
|
|
joinrel->joininfo = (List *) adjust_appendrel_attrs(root,
|
|
|
|
(Node *) parent_joinrel->joininfo,
|
|
|
|
nappinfos,
|
|
|
|
appinfos);
|
|
|
|
pfree(appinfos);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lateral relids referred in child join will be same as that referred in
|
|
|
|
* the parent relation. Throw any partial result computed while building
|
|
|
|
* the targetlist.
|
|
|
|
*/
|
|
|
|
bms_free(joinrel->direct_lateral_relids);
|
|
|
|
bms_free(joinrel->lateral_relids);
|
|
|
|
joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
|
|
|
|
joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the parent joinrel has pending equivalence classes, so does the
|
|
|
|
* child.
|
|
|
|
*/
|
|
|
|
joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
|
|
|
|
|
|
|
|
/* Is the join between partitions itself partitioned? */
|
|
|
|
build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
|
|
|
|
jointype);
|
|
|
|
|
|
|
|
/* Child joinrel is parallel safe if parent is parallel safe. */
|
|
|
|
joinrel->consider_parallel = parent_joinrel->consider_parallel;
|
|
|
|
|
|
|
|
|
|
|
|
/* Set estimates of the child-joinrel's size. */
|
|
|
|
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
|
|
|
|
sjinfo, restrictlist);
|
|
|
|
|
|
|
|
/* We build the join only once. */
|
|
|
|
Assert(!find_join_rel(root, joinrel->relids));
|
|
|
|
|
|
|
|
/* Add the relation to the PlannerInfo. */
|
|
|
|
add_join_rel(root, joinrel);
|
|
|
|
|
|
|
|
return joinrel;
|
|
|
|
}
|
|
|
|
|
2015-12-07 23:41:45 +01:00
|
|
|
/*
|
|
|
|
* min_join_parameterization
|
|
|
|
*
|
|
|
|
* Determine the minimum possible parameterization of a joinrel, that is, the
|
|
|
|
* set of other rels it contains LATERAL references to. We save this value in
|
|
|
|
* the join's RelOptInfo. This function is split out of build_join_rel()
|
|
|
|
* because join_is_legal() needs the value to check a prospective join.
|
|
|
|
*/
|
|
|
|
Relids
|
Still more fixes for planner's handling of LATERAL references.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references. If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too. The planner did not understand this and would generate
intermediate joins that could not be used. While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.
To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent. This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.
Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations. This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.
Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1dd02f1. Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures. The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen. I chose to move the
supporting logic into joinrels.c where the latter functions are.
Back-patch to 9.3 where LATERAL support was introduced.
2015-12-11 20:22:20 +01:00
|
|
|
min_join_parameterization(PlannerInfo *root,
|
|
|
|
Relids joinrelids,
|
|
|
|
RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel)
|
2015-12-07 23:41:45 +01:00
|
|
|
{
|
|
|
|
Relids result;
|
|
|
|
|
|
|
|
/*
|
Still more fixes for planner's handling of LATERAL references.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references. If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too. The planner did not understand this and would generate
intermediate joins that could not be used. While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.
To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent. This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.
Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations. This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.
Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1dd02f1. Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures. The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen. I chose to move the
supporting logic into joinrels.c where the latter functions are.
Back-patch to 9.3 where LATERAL support was introduced.
2015-12-11 20:22:20 +01:00
|
|
|
* Basically we just need the union of the inputs' lateral_relids, less
|
|
|
|
* whatever is already in the join.
|
|
|
|
*
|
|
|
|
* It's not immediately obvious that this is a valid way to compute the
|
|
|
|
* result, because it might seem that we're ignoring possible lateral refs
|
|
|
|
* of PlaceHolderVars that are due to be computed at the join but not in
|
|
|
|
* either input. However, because create_lateral_join_info() already
|
|
|
|
* charged all such PHV refs to each member baserel of the join, they'll
|
|
|
|
* be accounted for already in the inputs' lateral_relids. Likewise, we
|
|
|
|
* do not need to worry about doing transitive closure here, because that
|
|
|
|
* was already accounted for in the original baserel lateral_relids.
|
2015-12-07 23:41:45 +01:00
|
|
|
*/
|
Still more fixes for planner's handling of LATERAL references.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references. If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too. The planner did not understand this and would generate
intermediate joins that could not be used. While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.
To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent. This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.
Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations. This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.
Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1dd02f1. Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures. The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen. I chose to move the
supporting logic into joinrels.c where the latter functions are.
Back-patch to 9.3 where LATERAL support was introduced.
2015-12-11 20:22:20 +01:00
|
|
|
result = bms_union(outer_rel->lateral_relids, inner_rel->lateral_relids);
|
2015-12-07 23:41:45 +01:00
|
|
|
result = bms_del_members(result, joinrelids);
|
|
|
|
|
|
|
|
/* Maintain invariant that result is exactly NULL if empty */
|
|
|
|
if (bms_is_empty(result))
|
|
|
|
result = NULL;
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2003-06-30 01:05:05 +02:00
|
|
|
* build_joinrel_tlist
|
2008-10-21 22:42:53 +02:00
|
|
|
* Builds a join relation's target list from an input relation.
|
|
|
|
* (This is invoked twice to handle the two input relations.)
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
2003-06-30 01:05:05 +02:00
|
|
|
* The join's targetlist includes all Vars of its member relations that
|
2005-06-06 06:13:36 +02:00
|
|
|
* will still be needed above the join. This subroutine adds all such
|
|
|
|
* Vars from the specified input rel's tlist to the join rel's tlist.
|
2000-02-07 05:41:04 +01:00
|
|
|
*
|
2003-06-30 01:05:05 +02:00
|
|
|
* We also compute the expected width of the join's output, making use
|
|
|
|
* of data that was cached at the baserel level by set_rel_width().
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2003-06-30 01:05:05 +02:00
|
|
|
static void
|
2005-06-06 06:13:36 +02:00
|
|
|
build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
|
|
|
|
RelOptInfo *input_rel)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
Relids relids;
|
2005-06-06 06:13:36 +02:00
|
|
|
ListCell *vars;
|
1999-08-16 04:17:58 +02:00
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/* attrs_needed refers to parent relids and not those of a child. */
|
|
|
|
if (joinrel->top_parent_relids)
|
|
|
|
relids = joinrel->top_parent_relids;
|
|
|
|
else
|
|
|
|
relids = joinrel->relids;
|
|
|
|
|
2016-03-14 21:59:59 +01:00
|
|
|
foreach(vars, input_rel->reltarget->exprs)
|
2003-06-30 01:05:05 +02:00
|
|
|
{
|
2012-08-27 04:48:55 +02:00
|
|
|
Var *var = (Var *) lfirst(vars);
|
2005-06-06 06:13:36 +02:00
|
|
|
RelOptInfo *baserel;
|
|
|
|
int ndx;
|
|
|
|
|
2008-10-21 22:42:53 +02:00
|
|
|
/*
|
2009-06-11 16:49:15 +02:00
|
|
|
* Ignore PlaceHolderVars in the input tlists; we'll make our own
|
|
|
|
* decisions about whether to copy them.
|
2008-10-21 22:42:53 +02:00
|
|
|
*/
|
2012-08-27 04:48:55 +02:00
|
|
|
if (IsA(var, PlaceHolderVar))
|
2008-10-21 22:42:53 +02:00
|
|
|
continue;
|
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
/*
|
2017-11-29 15:24:24 +01:00
|
|
|
* Otherwise, anything in a baserel or joinrel targetlist ought to be
|
|
|
|
* a Var. Children of a partitioned table may have ConvertRowtypeExpr
|
|
|
|
* translating whole-row Var of a child to that of the parent.
|
|
|
|
* Children of an inherited table or subquery child rels can not
|
|
|
|
* directly participate in a join, so other kinds of nodes here.
|
2006-01-31 22:39:25 +01:00
|
|
|
*/
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
if (IsA(var, Var))
|
|
|
|
{
|
|
|
|
baserel = find_base_rel(root, var->varno);
|
|
|
|
ndx = var->varattno - baserel->min_attr;
|
|
|
|
}
|
|
|
|
else if (IsA(var, ConvertRowtypeExpr))
|
|
|
|
{
|
|
|
|
ConvertRowtypeExpr *child_expr = (ConvertRowtypeExpr *) var;
|
|
|
|
Var *childvar = (Var *) child_expr->arg;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Child's whole-row references are converted to look like those
|
|
|
|
* of parent using ConvertRowtypeExpr. There can be as many
|
|
|
|
* ConvertRowtypeExpr decorations as the depth of partition tree.
|
|
|
|
* The argument to the deepest ConvertRowtypeExpr is expected to
|
|
|
|
* be a whole-row reference of the child.
|
|
|
|
*/
|
|
|
|
while (IsA(childvar, ConvertRowtypeExpr))
|
|
|
|
{
|
|
|
|
child_expr = (ConvertRowtypeExpr *) childvar;
|
|
|
|
childvar = (Var *) child_expr->arg;
|
|
|
|
}
|
2017-11-29 15:24:24 +01:00
|
|
|
Assert(IsA(childvar, Var) &&childvar->varattno == 0);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
|
|
|
|
baserel = find_base_rel(root, childvar->varno);
|
|
|
|
ndx = 0 - baserel->min_attr;
|
|
|
|
}
|
|
|
|
else
|
Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-19 02:01:49 +01:00
|
|
|
elog(ERROR, "unexpected node type in rel targetlist: %d",
|
2012-08-27 04:48:55 +02:00
|
|
|
(int) nodeTag(var));
|
2003-06-30 01:05:05 +02:00
|
|
|
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/* Is the target expression still needed above this joinrel? */
|
2005-06-06 06:13:36 +02:00
|
|
|
if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
|
2003-06-30 01:05:05 +02:00
|
|
|
{
|
2005-06-06 06:13:36 +02:00
|
|
|
/* Yup, add it to the output */
|
2016-03-14 21:59:59 +01:00
|
|
|
joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, var);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Vars have cost zero, so no need to adjust reltarget->cost. Even
|
|
|
|
* if it's a ConvertRowtypeExpr, it will be computed only for the
|
|
|
|
* base relation, costing nothing for a join.
|
|
|
|
*/
|
2016-03-14 21:59:59 +01:00
|
|
|
joinrel->reltarget->width += baserel->attr_widths[ndx];
|
2003-06-30 01:05:05 +02:00
|
|
|
}
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* build_joinrel_restrictlist
|
|
|
|
* build_joinrel_joinlist
|
|
|
|
* These routines build lists of restriction and join clauses for a
|
|
|
|
* join relation from the joininfo lists of the relations it joins.
|
|
|
|
*
|
|
|
|
* These routines are separate because the restriction list must be
|
|
|
|
* built afresh for each pair of input sub-relations we consider, whereas
|
2005-06-09 06:19:00 +02:00
|
|
|
* the join list need only be computed once for any join RelOptInfo.
|
|
|
|
* The join list is fully determined by the set of rels making up the
|
2000-02-07 05:41:04 +01:00
|
|
|
* joinrel, so we should get the same results (up to ordering) from any
|
2014-05-06 18:12:18 +02:00
|
|
|
* candidate pair of sub-relations. But the restriction list is whatever
|
2000-02-07 05:41:04 +01:00
|
|
|
* is not handled in the sub-relations, so it depends on which
|
|
|
|
* sub-relations are considered.
|
|
|
|
*
|
|
|
|
* If a join clause from an input relation refers to base rels still not
|
|
|
|
* present in the joinrel, then it is still a join clause for the joinrel;
|
2005-06-09 06:19:00 +02:00
|
|
|
* we put it into the joininfo list for the joinrel. Otherwise,
|
2000-02-07 05:41:04 +01:00
|
|
|
* the clause is now a restrict clause for the joined relation, and we
|
|
|
|
* return it to the caller of build_joinrel_restrictlist() to be stored in
|
2014-05-06 18:12:18 +02:00
|
|
|
* join paths made from this pair of sub-relations. (It will not need to
|
2000-02-07 05:41:04 +01:00
|
|
|
* be considered further up the join tree.)
|
|
|
|
*
|
2007-01-20 21:45:41 +01:00
|
|
|
* In many case we will find the same RestrictInfos in both input
|
|
|
|
* relations' joinlists, so be careful to eliminate duplicates.
|
|
|
|
* Pointer equality should be a sufficient test for dups, since all
|
|
|
|
* the various joinlist entries ultimately refer to RestrictInfos
|
|
|
|
* pushed into them by distribute_restrictinfo_to_rels().
|
2001-10-18 18:11:42 +02:00
|
|
|
*
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'joinrel' is a join relation node
|
|
|
|
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
|
|
|
|
* to form joinrel.
|
|
|
|
*
|
|
|
|
* build_joinrel_restrictlist() returns a list of relevant restrictinfos,
|
|
|
|
* whereas build_joinrel_joinlist() stores its results in the joinrel's
|
2005-06-09 06:19:00 +02:00
|
|
|
* joininfo list. One or the other must accept each given clause!
|
2000-02-07 05:41:04 +01:00
|
|
|
*
|
|
|
|
* NB: Formerly, we made deep(!) copies of each input RestrictInfo to pass
|
|
|
|
* up to the join relation. I believe this is no longer necessary, because
|
2014-05-06 18:12:18 +02:00
|
|
|
* RestrictInfo nodes are no longer context-dependent. Instead, just include
|
2000-02-07 05:41:04 +01:00
|
|
|
* the original nodes in the lists made for the join relation.
|
|
|
|
*/
|
|
|
|
static List *
|
2005-06-06 00:32:58 +02:00
|
|
|
build_joinrel_restrictlist(PlannerInfo *root,
|
2001-10-18 18:11:42 +02:00
|
|
|
RelOptInfo *joinrel,
|
2000-02-07 05:41:04 +01:00
|
|
|
RelOptInfo *outer_rel,
|
2007-01-20 21:45:41 +01:00
|
|
|
RelOptInfo *inner_rel)
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2002-11-24 22:52:15 +01:00
|
|
|
List *result;
|
2001-10-18 18:11:42 +02:00
|
|
|
|
|
|
|
/*
|
2007-01-20 21:45:41 +01:00
|
|
|
* Collect all the clauses that syntactically belong at this level,
|
|
|
|
* eliminating any duplicates (important since we will see many of the
|
|
|
|
* same clauses arriving from both input relations).
|
2001-10-18 18:11:42 +02:00
|
|
|
*/
|
2007-01-20 21:45:41 +01:00
|
|
|
result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
|
|
|
|
result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
|
2007-11-15 22:14:46 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* Add on any clauses derived from EquivalenceClasses. These cannot be
|
2007-01-20 21:45:41 +01:00
|
|
|
* redundant with the clauses in the joininfo lists, so don't bother
|
|
|
|
* checking.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2007-01-20 21:45:41 +01:00
|
|
|
result = list_concat(result,
|
|
|
|
generate_join_implied_equalities(root,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
joinrel->relids,
|
|
|
|
outer_rel->relids,
|
2007-01-20 21:45:41 +01:00
|
|
|
inner_rel));
|
2001-10-18 18:11:42 +02:00
|
|
|
|
|
|
|
return result;
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
build_joinrel_joinlist(RelOptInfo *joinrel,
|
|
|
|
RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel)
|
|
|
|
{
|
2007-01-20 21:45:41 +01:00
|
|
|
List *result;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Collect all the clauses that syntactically belong above this level,
|
|
|
|
* eliminating any duplicates (important since we will see many of the
|
|
|
|
* same clauses arriving from both input relations).
|
|
|
|
*/
|
|
|
|
result = subbuild_joinrel_joinlist(joinrel, outer_rel->joininfo, NIL);
|
|
|
|
result = subbuild_joinrel_joinlist(joinrel, inner_rel->joininfo, result);
|
|
|
|
|
|
|
|
joinrel->joininfo = result;
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static List *
|
|
|
|
subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
|
2007-01-20 21:45:41 +01:00
|
|
|
List *joininfo_list,
|
|
|
|
List *new_restrictlist)
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2005-06-09 06:19:00 +02:00
|
|
|
ListCell *l;
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2005-06-09 06:19:00 +02:00
|
|
|
foreach(l, joininfo_list)
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2005-06-09 06:19:00 +02:00
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2005-06-09 06:19:00 +02:00
|
|
|
if (bms_is_subset(rinfo->required_relids, joinrel->relids))
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* This clause becomes a restriction clause for the joinrel, since
|
2007-01-20 21:45:41 +01:00
|
|
|
* it refers to no outside rels. Add it to the list, being
|
|
|
|
* careful to eliminate duplicates. (Since RestrictInfo nodes in
|
|
|
|
* different joinlists will have been multiply-linked rather than
|
|
|
|
* copied, pointer equality should be a sufficient test.)
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2007-01-20 21:45:41 +01:00
|
|
|
new_restrictlist = list_append_unique_ptr(new_restrictlist, rinfo);
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* This clause is still a join clause at this level, so we ignore
|
|
|
|
* it in this routine.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
return new_restrictlist;
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
static List *
|
2000-02-07 05:41:04 +01:00
|
|
|
subbuild_joinrel_joinlist(RelOptInfo *joinrel,
|
2007-01-20 21:45:41 +01:00
|
|
|
List *joininfo_list,
|
|
|
|
List *new_joininfo)
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2005-06-09 06:19:00 +02:00
|
|
|
ListCell *l;
|
2000-02-07 05:41:04 +01:00
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/* Expected to be called only for join between parent relations. */
|
|
|
|
Assert(joinrel->reloptkind == RELOPT_JOINREL);
|
|
|
|
|
2005-06-09 06:19:00 +02:00
|
|
|
foreach(l, joininfo_list)
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2005-06-09 06:19:00 +02:00
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2005-06-09 06:19:00 +02:00
|
|
|
if (bms_is_subset(rinfo->required_relids, joinrel->relids))
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* This clause becomes a restriction clause for the joinrel, since
|
|
|
|
* it refers to no outside rels. So we can ignore it in this
|
|
|
|
* routine.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* This clause is still a join clause at this level, so add it to
|
2007-11-15 22:14:46 +01:00
|
|
|
* the new joininfo list, being careful to eliminate duplicates.
|
|
|
|
* (Since RestrictInfo nodes in different joinlists will have been
|
|
|
|
* multiply-linked rather than copied, pointer equality should be
|
|
|
|
* a sufficient test.)
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2007-01-20 21:45:41 +01:00
|
|
|
new_joininfo = list_append_unique_ptr(new_joininfo, rinfo);
|
2000-02-07 05:41:04 +01:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
2007-01-20 21:45:41 +01:00
|
|
|
|
|
|
|
return new_joininfo;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
|
|
|
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
/*
|
|
|
|
* build_empty_join_rel
|
|
|
|
* Build a dummy join relation describing an empty set of base rels.
|
|
|
|
*
|
|
|
|
* This is used for queries with empty FROM clauses, such as "SELECT 2+2" or
|
|
|
|
* "INSERT INTO foo VALUES(...)". We don't try very hard to make the empty
|
|
|
|
* joinrel completely valid, since no real planning will be done with it ---
|
|
|
|
* we just need it to carry a simple Result path out of query_planner().
|
|
|
|
*/
|
|
|
|
RelOptInfo *
|
|
|
|
build_empty_join_rel(PlannerInfo *root)
|
|
|
|
{
|
|
|
|
RelOptInfo *joinrel;
|
|
|
|
|
|
|
|
/* The dummy join relation should be the only one ... */
|
|
|
|
Assert(root->join_rel_list == NIL);
|
|
|
|
|
|
|
|
joinrel = makeNode(RelOptInfo);
|
|
|
|
joinrel->reloptkind = RELOPT_JOINREL;
|
|
|
|
joinrel->relids = NULL; /* empty set */
|
|
|
|
joinrel->rows = 1; /* we produce one row for such cases */
|
|
|
|
joinrel->rtekind = RTE_JOIN;
|
2016-03-14 21:59:59 +01:00
|
|
|
joinrel->reltarget = create_empty_pathtarget();
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
|
|
|
|
root->join_rel_list = lappend(root->join_rel_list, joinrel);
|
|
|
|
|
|
|
|
return joinrel;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/*
|
|
|
|
* fetch_upper_rel
|
|
|
|
* Build a RelOptInfo describing some post-scan/join query processing,
|
|
|
|
* or return a pre-existing one if somebody already built it.
|
|
|
|
*
|
|
|
|
* An "upper" relation is identified by an UpperRelationKind and a Relids set.
|
|
|
|
* The meaning of the Relids set is not specified here, and very likely will
|
|
|
|
* vary for different relation kinds.
|
|
|
|
*
|
|
|
|
* Most of the fields in an upper-level RelOptInfo are not used and are not
|
|
|
|
* set here (though makeNode should ensure they're zeroes). We basically only
|
|
|
|
* care about fields that are of interest to add_path() and set_cheapest().
|
|
|
|
*/
|
|
|
|
RelOptInfo *
|
|
|
|
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
|
|
|
|
{
|
|
|
|
RelOptInfo *upperrel;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For the moment, our indexing data structure is just a List for each
|
|
|
|
* relation kind. If we ever get so many of one kind that this stops
|
|
|
|
* working well, we can improve it. No code outside this function should
|
|
|
|
* assume anything about how to find a particular upperrel.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* If we already made this upperrel for the query, return it */
|
|
|
|
foreach(lc, root->upper_rels[kind])
|
|
|
|
{
|
|
|
|
upperrel = (RelOptInfo *) lfirst(lc);
|
|
|
|
|
|
|
|
if (bms_equal(upperrel->relids, relids))
|
|
|
|
return upperrel;
|
|
|
|
}
|
|
|
|
|
|
|
|
upperrel = makeNode(RelOptInfo);
|
|
|
|
upperrel->reloptkind = RELOPT_UPPER_REL;
|
|
|
|
upperrel->relids = bms_copy(relids);
|
|
|
|
|
|
|
|
/* cheap startup cost is interesting iff not all tuples to be retrieved */
|
|
|
|
upperrel->consider_startup = (root->tuple_fraction > 0);
|
|
|
|
upperrel->consider_param_startup = false;
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
upperrel->consider_parallel = false; /* might get changed later */
|
2016-03-14 21:59:59 +01:00
|
|
|
upperrel->reltarget = create_empty_pathtarget();
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
upperrel->pathlist = NIL;
|
|
|
|
upperrel->cheapest_startup_path = NULL;
|
|
|
|
upperrel->cheapest_total_path = NULL;
|
|
|
|
upperrel->cheapest_unique_path = NULL;
|
|
|
|
upperrel->cheapest_parameterized_paths = NIL;
|
|
|
|
|
|
|
|
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
|
|
|
|
|
|
|
|
return upperrel;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
/*
|
|
|
|
* find_childrel_appendrelinfo
|
|
|
|
* Get the AppendRelInfo associated with an appendrel child rel.
|
|
|
|
*
|
|
|
|
* This search could be eliminated by storing a link in child RelOptInfos,
|
Fix some more problems with nested append relations.
As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.
Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.
Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels. The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family. That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them. So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.
The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced. In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members. (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong. A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
2014-10-02 01:30:24 +02:00
|
|
|
* but for now it doesn't seem performance-critical. (Also, it might be
|
|
|
|
* difficult to maintain such a link during mutation of the append_rel_list.)
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
*/
|
|
|
|
AppendRelInfo *
|
|
|
|
find_childrel_appendrelinfo(PlannerInfo *root, RelOptInfo *rel)
|
|
|
|
{
|
|
|
|
Index relid = rel->relid;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* Should only be called on child rels */
|
|
|
|
Assert(rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
|
|
|
|
|
|
|
|
foreach(lc, root->append_rel_list)
|
|
|
|
{
|
|
|
|
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
|
|
|
|
|
|
|
|
if (appinfo->child_relid == relid)
|
|
|
|
return appinfo;
|
|
|
|
}
|
|
|
|
/* should have found the entry ... */
|
|
|
|
elog(ERROR, "child rel %d not found in append_rel_list", relid);
|
|
|
|
return NULL; /* not reached */
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Fix some more problems with nested append relations.
As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.
Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.
Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels. The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family. That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them. So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.
The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced. In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members. (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong. A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
2014-10-02 01:30:24 +02:00
|
|
|
/*
|
|
|
|
* find_childrel_parents
|
|
|
|
* Compute the set of parent relids of an appendrel child rel.
|
|
|
|
*
|
|
|
|
* Since appendrels can be nested, a child could have multiple levels of
|
|
|
|
* appendrel ancestors. This function computes a Relids set of all the
|
|
|
|
* parent relation IDs.
|
|
|
|
*/
|
|
|
|
Relids
|
|
|
|
find_childrel_parents(PlannerInfo *root, RelOptInfo *rel)
|
|
|
|
{
|
|
|
|
Relids result = NULL;
|
|
|
|
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
Assert(rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
|
|
|
|
|
Fix some more problems with nested append relations.
As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.
Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.
Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels. The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family. That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them. So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.
The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced. In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members. (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong. A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
2014-10-02 01:30:24 +02:00
|
|
|
do
|
|
|
|
{
|
|
|
|
AppendRelInfo *appinfo = find_childrel_appendrelinfo(root, rel);
|
|
|
|
Index prelid = appinfo->parent_relid;
|
|
|
|
|
|
|
|
result = bms_add_member(result, prelid);
|
|
|
|
|
|
|
|
/* traverse up to the parent rel, loop if it's also a child rel */
|
|
|
|
rel = find_base_rel(root, prelid);
|
|
|
|
} while (rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
|
|
|
|
|
|
|
|
Assert(rel->reloptkind == RELOPT_BASEREL);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
/*
|
|
|
|
* get_baserel_parampathinfo
|
|
|
|
* Get the ParamPathInfo for a parameterized path for a base relation,
|
|
|
|
* constructing one if we don't have one already.
|
|
|
|
*
|
|
|
|
* This centralizes estimating the rowcounts for parameterized paths.
|
|
|
|
* We need to cache those to be sure we use the same rowcount for all paths
|
|
|
|
* of the same parameterization for a given rel. This is also a convenient
|
|
|
|
* place to determine which movable join clauses the parameterized path will
|
|
|
|
* be responsible for evaluating.
|
|
|
|
*/
|
|
|
|
ParamPathInfo *
|
|
|
|
get_baserel_parampathinfo(PlannerInfo *root, RelOptInfo *baserel,
|
|
|
|
Relids required_outer)
|
|
|
|
{
|
|
|
|
ParamPathInfo *ppi;
|
|
|
|
Relids joinrelids;
|
|
|
|
List *pclauses;
|
|
|
|
double rows;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* Unparameterized paths have no ParamPathInfo */
|
|
|
|
if (bms_is_empty(required_outer))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
Assert(!bms_overlap(baserel->relids, required_outer));
|
|
|
|
|
|
|
|
/* If we already have a PPI for this parameterization, just return it */
|
2017-08-15 18:30:38 +02:00
|
|
|
if ((ppi = find_param_path_info(baserel, required_outer)))
|
|
|
|
return ppi;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Identify all joinclauses that are movable to this base rel given this
|
|
|
|
* parameterization.
|
|
|
|
*/
|
|
|
|
joinrelids = bms_union(baserel->relids, required_outer);
|
|
|
|
pclauses = NIL;
|
|
|
|
foreach(lc, baserel->joininfo)
|
|
|
|
{
|
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
|
|
|
|
|
|
|
|
if (join_clause_is_movable_into(rinfo,
|
|
|
|
baserel->relids,
|
|
|
|
joinrelids))
|
|
|
|
pclauses = lappend(pclauses, rinfo);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add in joinclauses generated by EquivalenceClasses, too. (These
|
|
|
|
* necessarily satisfy join_clause_is_movable_into.)
|
|
|
|
*/
|
|
|
|
pclauses = list_concat(pclauses,
|
|
|
|
generate_join_implied_equalities(root,
|
|
|
|
joinrelids,
|
|
|
|
required_outer,
|
|
|
|
baserel));
|
|
|
|
|
|
|
|
/* Estimate the number of rows returned by the parameterized scan */
|
|
|
|
rows = get_parameterized_baserel_size(root, baserel, pclauses);
|
|
|
|
|
|
|
|
/* And now we can build the ParamPathInfo */
|
|
|
|
ppi = makeNode(ParamPathInfo);
|
|
|
|
ppi->ppi_req_outer = required_outer;
|
|
|
|
ppi->ppi_rows = rows;
|
|
|
|
ppi->ppi_clauses = pclauses;
|
|
|
|
baserel->ppilist = lappend(baserel->ppilist, ppi);
|
|
|
|
|
|
|
|
return ppi;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* get_joinrel_parampathinfo
|
|
|
|
* Get the ParamPathInfo for a parameterized path for a join relation,
|
|
|
|
* constructing one if we don't have one already.
|
|
|
|
*
|
|
|
|
* This centralizes estimating the rowcounts for parameterized paths.
|
|
|
|
* We need to cache those to be sure we use the same rowcount for all paths
|
|
|
|
* of the same parameterization for a given rel. This is also a convenient
|
|
|
|
* place to determine which movable join clauses the parameterized path will
|
|
|
|
* be responsible for evaluating.
|
|
|
|
*
|
|
|
|
* outer_path and inner_path are a pair of input paths that can be used to
|
|
|
|
* construct the join, and restrict_clauses is the list of regular join
|
|
|
|
* clauses (including clauses derived from EquivalenceClasses) that must be
|
|
|
|
* applied at the join node when using these inputs.
|
|
|
|
*
|
|
|
|
* Unlike the situation for base rels, the set of movable join clauses to be
|
|
|
|
* enforced at a join varies with the selected pair of input paths, so we
|
|
|
|
* must calculate that and pass it back, even if we already have a matching
|
|
|
|
* ParamPathInfo. We handle this by adding any clauses moved down to this
|
|
|
|
* join to *restrict_clauses, which is an in/out parameter. (The addition
|
|
|
|
* is done in such a way as to not modify the passed-in List structure.)
|
|
|
|
*
|
|
|
|
* Note: when considering a nestloop join, the caller must have removed from
|
|
|
|
* restrict_clauses any movable clauses that are themselves scheduled to be
|
|
|
|
* pushed into the right-hand path. We do not do that here since it's
|
|
|
|
* unnecessary for other join types.
|
|
|
|
*/
|
|
|
|
ParamPathInfo *
|
|
|
|
get_joinrel_parampathinfo(PlannerInfo *root, RelOptInfo *joinrel,
|
|
|
|
Path *outer_path,
|
|
|
|
Path *inner_path,
|
|
|
|
SpecialJoinInfo *sjinfo,
|
|
|
|
Relids required_outer,
|
|
|
|
List **restrict_clauses)
|
|
|
|
{
|
|
|
|
ParamPathInfo *ppi;
|
|
|
|
Relids join_and_req;
|
|
|
|
Relids outer_and_req;
|
|
|
|
Relids inner_and_req;
|
|
|
|
List *pclauses;
|
|
|
|
List *eclauses;
|
Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like
Nested Loop
-> Scan X
-> Nested Loop
-> Scan Y
-> Scan Z
Filter: Z.Z = X.X
The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node. On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members. So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there. If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.
This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape. (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.) Many
thanks to Alexander Kirkouski for submitting a reproducible test case.
The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-30 02:19:38 +02:00
|
|
|
List *dropped_ecs;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
double rows;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* Unparameterized paths have no ParamPathInfo or extra join clauses */
|
|
|
|
if (bms_is_empty(required_outer))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
Assert(!bms_overlap(joinrel->relids, required_outer));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Identify all joinclauses that are movable to this join rel given this
|
|
|
|
* parameterization. These are the clauses that are movable into this
|
|
|
|
* join, but not movable into either input path. Treat an unparameterized
|
|
|
|
* input path as not accepting parameterized clauses (because it won't,
|
|
|
|
* per the shortcut exit above), even though the joinclause movement rules
|
|
|
|
* might allow the same clauses to be moved into a parameterized path for
|
|
|
|
* that rel.
|
|
|
|
*/
|
|
|
|
join_and_req = bms_union(joinrel->relids, required_outer);
|
|
|
|
if (outer_path->param_info)
|
|
|
|
outer_and_req = bms_union(outer_path->parent->relids,
|
|
|
|
PATH_REQ_OUTER(outer_path));
|
|
|
|
else
|
2012-06-10 21:20:04 +02:00
|
|
|
outer_and_req = NULL; /* outer path does not accept parameters */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
if (inner_path->param_info)
|
|
|
|
inner_and_req = bms_union(inner_path->parent->relids,
|
|
|
|
PATH_REQ_OUTER(inner_path));
|
|
|
|
else
|
2012-06-10 21:20:04 +02:00
|
|
|
inner_and_req = NULL; /* inner path does not accept parameters */
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
|
|
|
pclauses = NIL;
|
|
|
|
foreach(lc, joinrel->joininfo)
|
|
|
|
{
|
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
|
|
|
|
|
|
|
|
if (join_clause_is_movable_into(rinfo,
|
|
|
|
joinrel->relids,
|
|
|
|
join_and_req) &&
|
|
|
|
!join_clause_is_movable_into(rinfo,
|
|
|
|
outer_path->parent->relids,
|
|
|
|
outer_and_req) &&
|
|
|
|
!join_clause_is_movable_into(rinfo,
|
|
|
|
inner_path->parent->relids,
|
|
|
|
inner_and_req))
|
|
|
|
pclauses = lappend(pclauses, rinfo);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Consider joinclauses generated by EquivalenceClasses, too */
|
|
|
|
eclauses = generate_join_implied_equalities(root,
|
|
|
|
join_and_req,
|
|
|
|
required_outer,
|
|
|
|
joinrel);
|
|
|
|
/* We only want ones that aren't movable to lower levels */
|
Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like
Nested Loop
-> Scan X
-> Nested Loop
-> Scan Y
-> Scan Z
Filter: Z.Z = X.X
The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node. On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members. So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there. If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.
This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape. (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.) Many
thanks to Alexander Kirkouski for submitting a reproducible test case.
The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-30 02:19:38 +02:00
|
|
|
dropped_ecs = NIL;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
foreach(lc, eclauses)
|
|
|
|
{
|
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
|
|
|
|
|
2015-07-28 19:20:39 +02:00
|
|
|
/*
|
|
|
|
* In principle, join_clause_is_movable_into() should accept anything
|
|
|
|
* returned by generate_join_implied_equalities(); but because its
|
|
|
|
* analysis is only approximate, sometimes it doesn't. So we
|
|
|
|
* currently cannot use this Assert; instead just assume it's okay to
|
|
|
|
* apply the joinclause at this level.
|
|
|
|
*/
|
|
|
|
#ifdef NOT_USED
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Assert(join_clause_is_movable_into(rinfo,
|
|
|
|
joinrel->relids,
|
|
|
|
join_and_req));
|
2015-07-28 19:20:39 +02:00
|
|
|
#endif
|
Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like
Nested Loop
-> Scan X
-> Nested Loop
-> Scan Y
-> Scan Z
Filter: Z.Z = X.X
The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node. On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members. So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there. If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.
This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape. (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.) Many
thanks to Alexander Kirkouski for submitting a reproducible test case.
The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-30 02:19:38 +02:00
|
|
|
if (join_clause_is_movable_into(rinfo,
|
|
|
|
outer_path->parent->relids,
|
|
|
|
outer_and_req))
|
|
|
|
continue; /* drop if movable into LHS */
|
|
|
|
if (join_clause_is_movable_into(rinfo,
|
|
|
|
inner_path->parent->relids,
|
|
|
|
inner_and_req))
|
|
|
|
{
|
|
|
|
/* drop if movable into RHS, but remember EC for use below */
|
|
|
|
Assert(rinfo->left_ec == rinfo->right_ec);
|
|
|
|
dropped_ecs = lappend(dropped_ecs, rinfo->left_ec);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
pclauses = lappend(pclauses, rinfo);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* EquivalenceClasses are harder to deal with than we could wish, because
|
|
|
|
* of the fact that a given EC can generate different clauses depending on
|
|
|
|
* context. Suppose we have an EC {X.X, Y.Y, Z.Z} where X and Y are the
|
|
|
|
* LHS and RHS of the current join and Z is in required_outer, and further
|
|
|
|
* suppose that the inner_path is parameterized by both X and Z. The code
|
|
|
|
* above will have produced either Z.Z = X.X or Z.Z = Y.Y from that EC,
|
|
|
|
* and in the latter case will have discarded it as being movable into the
|
|
|
|
* RHS. However, the EC machinery might have produced either Y.Y = X.X or
|
|
|
|
* Y.Y = Z.Z as the EC enforcement clause within the inner_path; it will
|
|
|
|
* not have produced both, and we can't readily tell from here which one
|
|
|
|
* it did pick. If we add no clause to this join, we'll end up with
|
|
|
|
* insufficient enforcement of the EC; either Z.Z or X.X will fail to be
|
|
|
|
* constrained to be equal to the other members of the EC. (When we come
|
|
|
|
* to join Z to this X/Y path, we will certainly drop whichever EC clause
|
|
|
|
* is generated at that join, so this omission won't get fixed later.)
|
|
|
|
*
|
|
|
|
* To handle this, for each EC we discarded such a clause from, try to
|
|
|
|
* generate a clause connecting the required_outer rels to the join's LHS
|
|
|
|
* ("Z.Z = X.X" in the terms of the above example). If successful, and if
|
|
|
|
* the clause can't be moved to the LHS, add it to the current join's
|
|
|
|
* restriction clauses. (If an EC cannot generate such a clause then it
|
|
|
|
* has nothing that needs to be enforced here, while if the clause can be
|
|
|
|
* moved into the LHS then it should have been enforced within that path.)
|
|
|
|
*
|
|
|
|
* Note that we don't need similar processing for ECs whose clause was
|
|
|
|
* considered to be movable into the LHS, because the LHS can't refer to
|
|
|
|
* the RHS so there is no comparable ambiguity about what it might
|
|
|
|
* actually be enforcing internally.
|
|
|
|
*/
|
|
|
|
if (dropped_ecs)
|
|
|
|
{
|
|
|
|
Relids real_outer_and_req;
|
|
|
|
|
|
|
|
real_outer_and_req = bms_union(outer_path->parent->relids,
|
|
|
|
required_outer);
|
|
|
|
eclauses =
|
|
|
|
generate_join_implied_equalities_for_ecs(root,
|
|
|
|
dropped_ecs,
|
|
|
|
real_outer_and_req,
|
|
|
|
required_outer,
|
|
|
|
outer_path->parent);
|
|
|
|
foreach(lc, eclauses)
|
|
|
|
{
|
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
|
|
|
|
|
|
|
|
/* As above, can't quite assert this here */
|
|
|
|
#ifdef NOT_USED
|
|
|
|
Assert(join_clause_is_movable_into(rinfo,
|
|
|
|
outer_path->parent->relids,
|
|
|
|
real_outer_and_req));
|
|
|
|
#endif
|
|
|
|
if (!join_clause_is_movable_into(rinfo,
|
|
|
|
outer_path->parent->relids,
|
|
|
|
outer_and_req))
|
|
|
|
pclauses = lappend(pclauses, rinfo);
|
|
|
|
}
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now, attach the identified moved-down clauses to the caller's
|
|
|
|
* restrict_clauses list. By using list_concat in this order, we leave
|
|
|
|
* the original list structure of restrict_clauses undamaged.
|
|
|
|
*/
|
|
|
|
*restrict_clauses = list_concat(pclauses, *restrict_clauses);
|
|
|
|
|
|
|
|
/* If we already have a PPI for this parameterization, just return it */
|
2017-08-15 18:30:38 +02:00
|
|
|
if ((ppi = find_param_path_info(joinrel, required_outer)))
|
|
|
|
return ppi;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
|
|
|
/* Estimate the number of rows returned by the parameterized join */
|
|
|
|
rows = get_parameterized_joinrel_size(root, joinrel,
|
2016-06-18 21:22:34 +02:00
|
|
|
outer_path,
|
|
|
|
inner_path,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
sjinfo,
|
|
|
|
*restrict_clauses);
|
|
|
|
|
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* And now we can build the ParamPathInfo. No point in saving the
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* input-pair-dependent clause list, though.
|
|
|
|
*
|
|
|
|
* Note: in GEQO mode, we'll be called in a temporary memory context, but
|
|
|
|
* the joinrel structure is there too, so no problem.
|
|
|
|
*/
|
|
|
|
ppi = makeNode(ParamPathInfo);
|
|
|
|
ppi->ppi_req_outer = required_outer;
|
|
|
|
ppi->ppi_rows = rows;
|
|
|
|
ppi->ppi_clauses = NIL;
|
|
|
|
joinrel->ppilist = lappend(joinrel->ppilist, ppi);
|
|
|
|
|
|
|
|
return ppi;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* get_appendrel_parampathinfo
|
|
|
|
* Get the ParamPathInfo for a parameterized path for an append relation.
|
|
|
|
*
|
|
|
|
* For an append relation, the rowcount estimate will just be the sum of
|
2014-05-06 18:12:18 +02:00
|
|
|
* the estimates for its children. However, we still need a ParamPathInfo
|
|
|
|
* to flag the fact that the path requires parameters. So this just creates
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* a suitable struct with zero ppi_rows (and no ppi_clauses either, since
|
|
|
|
* the Append node isn't responsible for checking quals).
|
|
|
|
*/
|
|
|
|
ParamPathInfo *
|
|
|
|
get_appendrel_parampathinfo(RelOptInfo *appendrel, Relids required_outer)
|
|
|
|
{
|
|
|
|
ParamPathInfo *ppi;
|
|
|
|
|
|
|
|
/* Unparameterized paths have no ParamPathInfo */
|
|
|
|
if (bms_is_empty(required_outer))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
Assert(!bms_overlap(appendrel->relids, required_outer));
|
|
|
|
|
|
|
|
/* If we already have a PPI for this parameterization, just return it */
|
2017-08-15 18:30:38 +02:00
|
|
|
if ((ppi = find_param_path_info(appendrel, required_outer)))
|
|
|
|
return ppi;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
|
|
|
|
/* Else build the ParamPathInfo */
|
|
|
|
ppi = makeNode(ParamPathInfo);
|
|
|
|
ppi->ppi_req_outer = required_outer;
|
|
|
|
ppi->ppi_rows = 0;
|
|
|
|
ppi->ppi_clauses = NIL;
|
|
|
|
appendrel->ppilist = lappend(appendrel->ppilist, ppi);
|
|
|
|
|
|
|
|
return ppi;
|
|
|
|
}
|
2017-08-15 18:30:38 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns a ParamPathInfo for the parameterization given by required_outer, if
|
|
|
|
* already available in the given rel. Returns NULL otherwise.
|
|
|
|
*/
|
|
|
|
ParamPathInfo *
|
|
|
|
find_param_path_info(RelOptInfo *rel, Relids required_outer)
|
|
|
|
{
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, rel->ppilist)
|
|
|
|
{
|
|
|
|
ParamPathInfo *ppi = (ParamPathInfo *) lfirst(lc);
|
|
|
|
|
|
|
|
if (bms_equal(ppi->ppi_req_outer, required_outer))
|
|
|
|
return ppi;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* build_joinrel_partition_info
|
|
|
|
* If the two relations have same partitioning scheme, their join may be
|
|
|
|
* partitioned and will follow the same partitioning scheme as the joining
|
|
|
|
* relations. Set the partition scheme and partition key expressions in
|
|
|
|
* the join relation.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
|
|
|
|
RelOptInfo *inner_rel, List *restrictlist,
|
|
|
|
JoinType jointype)
|
|
|
|
{
|
|
|
|
int partnatts;
|
|
|
|
int cnt;
|
|
|
|
PartitionScheme part_scheme;
|
|
|
|
|
2018-02-16 16:33:59 +01:00
|
|
|
/* Nothing to do if partitionwise join technique is disabled. */
|
|
|
|
if (!enable_partitionwise_join)
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
{
|
|
|
|
Assert(!IS_PARTITIONED_REL(joinrel));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2018-02-16 16:33:59 +01:00
|
|
|
* We can only consider this join as an input to further partitionwise
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
* joins if (a) the input relations are partitioned, (b) the partition
|
|
|
|
* schemes match, and (c) we can identify an equi-join between the
|
|
|
|
* partition keys. Note that if it were possible for
|
|
|
|
* have_partkey_equi_join to return different answers for the same joinrel
|
|
|
|
* depending on which join ordering we try first, this logic would break.
|
|
|
|
* That shouldn't happen, though, because of the way the query planner
|
|
|
|
* deduces implied equalities and reorders the joins. Please see
|
|
|
|
* optimizer/README for details.
|
|
|
|
*/
|
|
|
|
if (!IS_PARTITIONED_REL(outer_rel) || !IS_PARTITIONED_REL(inner_rel) ||
|
|
|
|
outer_rel->part_scheme != inner_rel->part_scheme ||
|
|
|
|
!have_partkey_equi_join(outer_rel, inner_rel, jointype, restrictlist))
|
|
|
|
{
|
|
|
|
Assert(!IS_PARTITIONED_REL(joinrel));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
part_scheme = outer_rel->part_scheme;
|
|
|
|
|
|
|
|
Assert(REL_HAS_ALL_PART_PROPS(outer_rel) &&
|
|
|
|
REL_HAS_ALL_PART_PROPS(inner_rel));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For now, our partition matching algorithm can match partitions only
|
|
|
|
* when the partition bounds of the joining relations are exactly same.
|
|
|
|
* So, bail out otherwise.
|
|
|
|
*/
|
|
|
|
if (outer_rel->nparts != inner_rel->nparts ||
|
|
|
|
!partition_bounds_equal(part_scheme->partnatts,
|
|
|
|
part_scheme->parttyplen,
|
|
|
|
part_scheme->parttypbyval,
|
|
|
|
outer_rel->boundinfo, inner_rel->boundinfo))
|
|
|
|
{
|
|
|
|
Assert(!IS_PARTITIONED_REL(joinrel));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This function will be called only once for each joinrel, hence it
|
|
|
|
* should not have partition scheme, partition bounds, partition key
|
|
|
|
* expressions and array for storing child relations set.
|
|
|
|
*/
|
|
|
|
Assert(!joinrel->part_scheme && !joinrel->partexprs &&
|
|
|
|
!joinrel->nullable_partexprs && !joinrel->part_rels &&
|
|
|
|
!joinrel->boundinfo);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Join relation is partitioned using the same partitioning scheme as the
|
|
|
|
* joining relations and has same bounds.
|
|
|
|
*/
|
|
|
|
joinrel->part_scheme = part_scheme;
|
|
|
|
joinrel->boundinfo = outer_rel->boundinfo;
|
|
|
|
partnatts = joinrel->part_scheme->partnatts;
|
|
|
|
joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
|
|
|
|
joinrel->nullable_partexprs =
|
2017-11-29 15:24:24 +01:00
|
|
|
(List **) palloc0(sizeof(List *) * partnatts);
|
2018-02-05 23:31:57 +01:00
|
|
|
joinrel->nparts = outer_rel->nparts;
|
|
|
|
joinrel->part_rels =
|
|
|
|
(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * joinrel->nparts);
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Construct partition keys for the join.
|
|
|
|
*
|
|
|
|
* An INNER join between two partitioned relations can be regarded as
|
2017-11-29 15:24:24 +01:00
|
|
|
* partitioned by either key expression. For example, A INNER JOIN B ON
|
|
|
|
* A.a = B.b can be regarded as partitioned on A.a or on B.b; they are
|
|
|
|
* equivalent.
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
*
|
|
|
|
* For a SEMI or ANTI join, the result can only be regarded as being
|
2017-11-29 15:24:24 +01:00
|
|
|
* partitioned in the same manner as the outer side, since the inner
|
|
|
|
* columns are not retained.
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
*
|
|
|
|
* An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
|
|
|
|
* B.b NULL. These rows may not fit the partitioning conditions imposed on
|
|
|
|
* B.b. Hence, strictly speaking, the join is not partitioned by B.b and
|
|
|
|
* thus partition keys of an OUTER join should include partition key
|
|
|
|
* expressions from the OUTER side only. However, because all
|
|
|
|
* commonly-used comparison operators are strict, the presence of nulls on
|
|
|
|
* the outer side doesn't cause any problem; they can't match anything at
|
2017-11-29 15:24:24 +01:00
|
|
|
* future join levels anyway. Therefore, we track two sets of
|
|
|
|
* expressions: those that authentically partition the relation
|
|
|
|
* (partexprs) and those that partition the relation with the exception
|
|
|
|
* that extra nulls may be present (nullable_partexprs). When the
|
|
|
|
* comparison operator is strict, the latter is just as good as the
|
|
|
|
* former.
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
*/
|
|
|
|
for (cnt = 0; cnt < partnatts; cnt++)
|
|
|
|
{
|
|
|
|
List *outer_expr;
|
|
|
|
List *outer_null_expr;
|
|
|
|
List *inner_expr;
|
|
|
|
List *inner_null_expr;
|
|
|
|
List *partexpr = NIL;
|
|
|
|
List *nullable_partexpr = NIL;
|
|
|
|
|
|
|
|
outer_expr = list_copy(outer_rel->partexprs[cnt]);
|
|
|
|
outer_null_expr = list_copy(outer_rel->nullable_partexprs[cnt]);
|
|
|
|
inner_expr = list_copy(inner_rel->partexprs[cnt]);
|
|
|
|
inner_null_expr = list_copy(inner_rel->nullable_partexprs[cnt]);
|
|
|
|
|
|
|
|
switch (jointype)
|
|
|
|
{
|
|
|
|
case JOIN_INNER:
|
|
|
|
partexpr = list_concat(outer_expr, inner_expr);
|
|
|
|
nullable_partexpr = list_concat(outer_null_expr,
|
|
|
|
inner_null_expr);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case JOIN_SEMI:
|
|
|
|
case JOIN_ANTI:
|
|
|
|
partexpr = outer_expr;
|
|
|
|
nullable_partexpr = outer_null_expr;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case JOIN_LEFT:
|
|
|
|
partexpr = outer_expr;
|
|
|
|
nullable_partexpr = list_concat(inner_expr,
|
|
|
|
outer_null_expr);
|
|
|
|
nullable_partexpr = list_concat(nullable_partexpr,
|
|
|
|
inner_null_expr);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case JOIN_FULL:
|
|
|
|
nullable_partexpr = list_concat(outer_expr,
|
|
|
|
inner_expr);
|
|
|
|
nullable_partexpr = list_concat(nullable_partexpr,
|
|
|
|
outer_null_expr);
|
|
|
|
nullable_partexpr = list_concat(nullable_partexpr,
|
|
|
|
inner_null_expr);
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
elog(ERROR, "unrecognized join type: %d", (int) jointype);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
joinrel->partexprs[cnt] = partexpr;
|
|
|
|
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
|
|
|
|
}
|
|
|
|
}
|