1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* allpaths.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* Routines to find possible search paths for processing a query
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2020-01-01 18:21:45 +01:00
|
|
|
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/optimizer/path/allpaths.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2016-06-16 19:47:20 +02:00
|
|
|
#include <limits.h>
|
2008-06-27 05:56:55 +02:00
|
|
|
#include <math.h>
|
|
|
|
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
#include "access/sysattr.h"
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
#include "access/tsmapi.h"
|
2011-02-20 06:17:18 +01:00
|
|
|
#include "catalog/pg_class.h"
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
#include "catalog/pg_operator.h"
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
#include "catalog/pg_proc.h"
|
2012-03-05 22:15:59 +01:00
|
|
|
#include "foreign/fdwapi.h"
|
2017-09-14 21:41:08 +02:00
|
|
|
#include "miscadmin.h"
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
#include "nodes/makefuncs.h"
|
2008-08-26 00:42:34 +02:00
|
|
|
#include "nodes/nodeFuncs.h"
|
2001-10-18 18:11:42 +02:00
|
|
|
#ifdef OPTIMIZER_DEBUG
|
|
|
|
#include "nodes/print.h"
|
|
|
|
#endif
|
2019-01-10 18:54:31 +01:00
|
|
|
#include "optimizer/appendinfo.h"
|
2001-01-18 08:12:37 +01:00
|
|
|
#include "optimizer/clauses.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
#include "optimizer/cost.h"
|
1997-02-19 13:59:07 +01:00
|
|
|
#include "optimizer/geqo.h"
|
2019-01-10 18:54:31 +01:00
|
|
|
#include "optimizer/inherit.h"
|
2019-01-29 21:48:51 +01:00
|
|
|
#include "optimizer/optimizer.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "optimizer/pathnode.h"
|
|
|
|
#include "optimizer/paths.h"
|
2000-09-29 20:21:41 +02:00
|
|
|
#include "optimizer/plancat.h"
|
|
|
|
#include "optimizer/planner.h"
|
2009-07-06 20:26:30 +02:00
|
|
|
#include "optimizer/restrictinfo.h"
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
#include "optimizer/tlist.h"
|
2003-03-22 02:49:38 +01:00
|
|
|
#include "parser/parse_clause.h"
|
2006-07-11 18:35:33 +02:00
|
|
|
#include "parser/parsetree.h"
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
#include "partitioning/partbounds.h"
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
#include "partitioning/partprune.h"
|
2001-01-18 08:12:37 +01:00
|
|
|
#include "rewrite/rewriteManip.h"
|
2011-02-20 06:17:18 +01:00
|
|
|
#include "utils/lsyscache.h"
|
1997-02-19 13:59:07 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2014-06-27 20:08:48 +02:00
|
|
|
/* results of subquery_is_pushdown_safe */
|
|
|
|
typedef struct pushdown_safety_info
|
|
|
|
{
|
|
|
|
bool *unsafeColumns; /* which output columns are unsafe to use */
|
|
|
|
bool unsafeVolatile; /* don't push down volatile quals */
|
|
|
|
bool unsafeLeaky; /* don't push down leaky quals */
|
|
|
|
} pushdown_safety_info;
|
|
|
|
|
2003-01-26 00:10:30 +01:00
|
|
|
/* These parameters are set by GUC */
|
|
|
|
bool enable_geqo = false; /* just in case GUC doesn't set it */
|
|
|
|
int geqo_threshold;
|
Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 19:37:24 +01:00
|
|
|
int min_parallel_table_scan_size;
|
|
|
|
int min_parallel_index_scan_size;
|
1997-04-29 06:37:22 +02:00
|
|
|
|
2014-11-21 20:05:46 +01:00
|
|
|
/* Hook for plugins to get control in set_rel_pathlist() */
|
|
|
|
set_rel_pathlist_hook_type set_rel_pathlist_hook = NULL;
|
|
|
|
|
2007-09-26 20:51:51 +02:00
|
|
|
/* Hook for plugins to replace standard_join_search() */
|
|
|
|
join_search_hook_type join_search_hook = NULL;
|
|
|
|
|
1997-04-29 06:37:22 +02:00
|
|
|
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
static void set_base_rel_consider_startup(PlannerInfo *root);
|
2012-01-28 01:26:38 +01:00
|
|
|
static void set_base_rel_sizes(PlannerInfo *root);
|
2005-06-06 00:32:58 +02:00
|
|
|
static void set_base_rel_pathlists(PlannerInfo *root);
|
2012-01-28 01:26:38 +01:00
|
|
|
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
Index rti, RangeTblEntry *rte);
|
2007-04-21 23:01:45 +02:00
|
|
|
static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
Index rti, RangeTblEntry *rte);
|
2012-01-28 01:26:38 +01:00
|
|
|
static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2016-04-30 18:29:21 +02:00
|
|
|
static void create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel);
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2005-06-06 00:32:58 +02:00
|
|
|
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2015-05-15 20:37:10 +02:00
|
|
|
static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2015-05-15 20:37:10 +02:00
|
|
|
static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2012-01-28 01:26:38 +01:00
|
|
|
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2012-01-28 01:26:38 +01:00
|
|
|
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2012-01-28 01:26:38 +01:00
|
|
|
static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
Index rti, RangeTblEntry *rte);
|
2006-01-31 22:39:25 +01:00
|
|
|
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
Index rti, RangeTblEntry *rte);
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
List *live_childrels,
|
|
|
|
List *all_child_pathkeys,
|
|
|
|
List *partitioned_rels);
|
2013-07-08 04:37:24 +02:00
|
|
|
static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
|
2019-05-22 19:04:48 +02:00
|
|
|
RelOptInfo *rel,
|
|
|
|
Relids required_outer);
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
static void accumulate_append_subpath(Path *path,
|
2019-05-22 19:04:48 +02:00
|
|
|
List **subpaths, List **special_subpaths);
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
static Path *get_singleton_append_subpath(Path *path);
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
static void set_dummy_rel_pathlist(RelOptInfo *rel);
|
2005-06-06 00:32:58 +02:00
|
|
|
static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
Index rti, RangeTblEntry *rte);
|
2005-06-06 00:32:58 +02:00
|
|
|
static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2006-08-02 03:59:48 +02:00
|
|
|
static void set_values_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2017-03-08 16:39:37 +01:00
|
|
|
static void set_tablefunc_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2008-10-04 23:56:55 +02:00
|
|
|
static void set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2017-05-17 22:31:56 +02:00
|
|
|
RangeTblEntry *rte);
|
2019-05-22 19:04:48 +02:00
|
|
|
static void set_namedtuplestore_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
RangeTblEntry *rte);
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
static void set_result_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2008-10-04 23:56:55 +02:00
|
|
|
static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte);
|
2005-12-20 03:30:36 +01:00
|
|
|
static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
|
2003-04-25 01:43:09 +02:00
|
|
|
static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
|
2019-05-22 19:04:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo);
|
2003-04-25 01:43:09 +02:00
|
|
|
static bool recurse_pushdown_safe(Node *setOp, Query *topquery,
|
2019-05-22 19:04:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo);
|
2014-06-27 20:08:48 +02:00
|
|
|
static void check_output_expressions(Query *subquery,
|
2019-05-22 19:04:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo);
|
2003-04-25 01:43:09 +02:00
|
|
|
static void compare_tlist_datatypes(List *tlist, List *colTypes,
|
2019-05-22 19:04:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo);
|
2014-06-28 08:08:08 +02:00
|
|
|
static bool targetIsInAllPartitionLists(TargetEntry *tle, Query *query);
|
2003-04-25 01:43:09 +02:00
|
|
|
static bool qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
|
2019-05-22 19:04:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo);
|
2005-06-04 21:19:42 +02:00
|
|
|
static void subquery_push_qual(Query *subquery,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte, Index rti, Node *qual);
|
2002-08-29 18:03:49 +02:00
|
|
|
static void recurse_push_qual(Node *setOp, Query *topquery,
|
2019-05-22 19:04:48 +02:00
|
|
|
RangeTblEntry *rte, Index rti, Node *qual);
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
|
1998-09-01 06:40:42 +02:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1999-02-15 04:22:37 +01:00
|
|
|
* make_one_rel
|
|
|
|
* Finds all possible access paths for executing a query, returning a
|
2000-02-07 05:41:04 +01:00
|
|
|
* single rel that represents the join of all base rels in the query.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
1999-02-15 04:22:37 +01:00
|
|
|
RelOptInfo *
|
2005-12-20 03:30:36 +01:00
|
|
|
make_one_rel(PlannerInfo *root, List *joinlist)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2000-09-29 20:21:41 +02:00
|
|
|
RelOptInfo *rel;
|
2012-01-28 01:26:38 +01:00
|
|
|
Index rti;
|
2018-11-07 18:12:56 +01:00
|
|
|
double total_pages;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Construct the all_baserels Relids set.
|
|
|
|
*/
|
|
|
|
root->all_baserels = NULL;
|
|
|
|
for (rti = 1; rti < root->simple_rel_array_size; rti++)
|
|
|
|
{
|
|
|
|
RelOptInfo *brel = root->simple_rel_array[rti];
|
|
|
|
|
|
|
|
/* there may be empty slots corresponding to non-baserel RTEs */
|
|
|
|
if (brel == NULL)
|
|
|
|
continue;
|
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
Assert(brel->relid == rti); /* sanity check on array */
|
2012-01-28 01:26:38 +01:00
|
|
|
|
|
|
|
/* ignore RTEs that are "other rels" */
|
|
|
|
if (brel->reloptkind != RELOPT_BASEREL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
root->all_baserels = bms_add_member(root->all_baserels, brel->relid);
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
/* Mark base rels as to whether we care about fast-start plans */
|
|
|
|
set_base_rel_consider_startup(root);
|
|
|
|
|
1999-07-31 00:34:19 +02:00
|
|
|
/*
|
2018-11-07 18:12:56 +01:00
|
|
|
* Compute size estimates and consider_parallel flags for each base rel.
|
1999-07-31 00:34:19 +02:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
set_base_rel_sizes(root);
|
2018-11-07 18:12:56 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We should now have size estimates for every actual table involved in
|
|
|
|
* the query, and we also know which if any have been deleted from the
|
|
|
|
* query by join removal, pruned by partition pruning, or eliminated by
|
|
|
|
* constraint exclusion. So we can now compute total_table_pages.
|
|
|
|
*
|
|
|
|
* Note that appendrels are not double-counted here, even though we don't
|
|
|
|
* bother to distinguish RelOptInfos for appendrel parents, because the
|
|
|
|
* parents will have pages = 0.
|
|
|
|
*
|
|
|
|
* XXX if a table is self-joined, we will count it once per appearance,
|
|
|
|
* which perhaps is the wrong thing ... but that's not completely clear,
|
|
|
|
* and detecting self-joins here is difficult, so ignore it for now.
|
|
|
|
*/
|
|
|
|
total_pages = 0;
|
|
|
|
for (rti = 1; rti < root->simple_rel_array_size; rti++)
|
|
|
|
{
|
|
|
|
RelOptInfo *brel = root->simple_rel_array[rti];
|
|
|
|
|
|
|
|
if (brel == NULL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
Assert(brel->relid == rti); /* sanity check on array */
|
|
|
|
|
|
|
|
if (IS_DUMMY_REL(brel))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (IS_SIMPLE_REL(brel))
|
|
|
|
total_pages += (double) brel->pages;
|
|
|
|
}
|
|
|
|
root->total_table_pages = total_pages;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Generate access paths for each base rel.
|
|
|
|
*/
|
2000-11-12 01:37:02 +01:00
|
|
|
set_base_rel_pathlists(root);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-09-12 23:07:18 +02:00
|
|
|
/*
|
2000-09-29 20:21:41 +02:00
|
|
|
* Generate access paths for the entire join tree.
|
2000-09-12 23:07:18 +02:00
|
|
|
*/
|
2005-12-20 03:30:36 +01:00
|
|
|
rel = make_rel_from_joinlist(root, joinlist);
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2000-09-29 20:21:41 +02:00
|
|
|
/*
|
2005-06-06 06:13:36 +02:00
|
|
|
* The result should join all and only the query's base rels.
|
2000-09-29 20:21:41 +02:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
Assert(bms_equal(rel->relids, root->all_baserels));
|
2005-06-06 06:13:36 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
return rel;
|
|
|
|
}
|
2005-06-06 06:13:36 +02:00
|
|
|
|
Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick. The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case. If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is. To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.
Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths. That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic. In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem. To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.
To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans. There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.
Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.
Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 17:58:47 +02:00
|
|
|
/*
|
|
|
|
* set_base_rel_consider_startup
|
|
|
|
* Set the consider_[param_]startup flags for each base-relation entry.
|
|
|
|
*
|
|
|
|
* For the moment, we only deal with consider_param_startup here; because the
|
|
|
|
* logic for consider_startup is pretty trivial and is the same for every base
|
|
|
|
* relation, we just let build_simple_rel() initialize that flag correctly to
|
|
|
|
* start with. If that logic ever gets more complicated it would probably
|
|
|
|
* be better to move it here.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_base_rel_consider_startup(PlannerInfo *root)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Since parameterized paths can only be used on the inside of a nestloop
|
|
|
|
* join plan, there is usually little value in considering fast-start
|
|
|
|
* plans for them. However, for relations that are on the RHS of a SEMI
|
|
|
|
* or ANTI join, a fast-start plan can be useful because we're only going
|
|
|
|
* to care about fetching one tuple anyway.
|
|
|
|
*
|
|
|
|
* To minimize growth of planning time, we currently restrict this to
|
|
|
|
* cases where the RHS is a single base relation, not a join; there is no
|
|
|
|
* provision for consider_param_startup to get set at all on joinrels.
|
|
|
|
* Also we don't worry about appendrels. costsize.c's costing rules for
|
|
|
|
* nestloop semi/antijoins don't consider such cases either.
|
|
|
|
*/
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, root->join_info_list)
|
|
|
|
{
|
|
|
|
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
|
|
|
|
int varno;
|
|
|
|
|
|
|
|
if ((sjinfo->jointype == JOIN_SEMI || sjinfo->jointype == JOIN_ANTI) &&
|
|
|
|
bms_get_singleton_member(sjinfo->syn_righthand, &varno))
|
|
|
|
{
|
|
|
|
RelOptInfo *rel = find_base_rel(root, varno);
|
|
|
|
|
|
|
|
rel->consider_param_startup = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* set_base_rel_sizes
|
|
|
|
* Set the size estimates (rows and widths) for each base-relation entry.
|
2016-06-10 00:02:36 +02:00
|
|
|
* Also determine whether to consider parallel paths for base relations.
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
|
|
|
* We do this in a separate pass over the base rels so that rowcount
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
* estimates are available for parameterized path generation, and also so
|
2016-07-03 23:57:28 +02:00
|
|
|
* that each rel's consider_parallel flag is set correctly before we begin to
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
* generate paths.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_base_rel_sizes(PlannerInfo *root)
|
|
|
|
{
|
|
|
|
Index rti;
|
2005-06-06 06:13:36 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
for (rti = 1; rti < root->simple_rel_array_size; rti++)
|
|
|
|
{
|
|
|
|
RelOptInfo *rel = root->simple_rel_array[rti];
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
RangeTblEntry *rte;
|
2005-06-06 06:13:36 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/* there may be empty slots corresponding to non-baserel RTEs */
|
|
|
|
if (rel == NULL)
|
|
|
|
continue;
|
2005-06-06 06:13:36 +02:00
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
Assert(rel->relid == rti); /* sanity check on array */
|
2005-06-06 06:13:36 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/* ignore RTEs that are "other rels" */
|
|
|
|
if (rel->reloptkind != RELOPT_BASEREL)
|
|
|
|
continue;
|
2000-09-29 20:21:41 +02:00
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
rte = root->simple_rte_array[rti];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If parallelism is allowable for this query in general, see whether
|
|
|
|
* it's allowable for this rel in particular. We have to do this
|
2016-07-03 23:57:28 +02:00
|
|
|
* before set_rel_size(), because (a) if this rel is an inheritance
|
|
|
|
* parent, set_append_rel_size() will use and perhaps change the rel's
|
|
|
|
* consider_parallel flag, and (b) for some RTE types, set_rel_size()
|
|
|
|
* goes ahead and makes paths immediately.
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
*/
|
|
|
|
if (root->glob->parallelModeOK)
|
|
|
|
set_rel_consider_parallel(root, rel, rte);
|
|
|
|
|
|
|
|
set_rel_size(root, rel, rti, rte);
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2000-11-12 01:37:02 +01:00
|
|
|
* set_base_rel_pathlists
|
2000-02-07 05:41:04 +01:00
|
|
|
* Finds all paths available for scanning each base-relation entry.
|
|
|
|
* Sequential scan and any available indices are considered.
|
|
|
|
* Each useful path is attached to its relation's 'pathlist' field.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
|
|
|
static void
|
2005-06-06 00:32:58 +02:00
|
|
|
set_base_rel_pathlists(PlannerInfo *root)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2005-06-06 06:13:36 +02:00
|
|
|
Index rti;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
for (rti = 1; rti < root->simple_rel_array_size; rti++)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
2006-01-31 22:39:25 +01:00
|
|
|
RelOptInfo *rel = root->simple_rel_array[rti];
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2005-06-06 06:13:36 +02:00
|
|
|
/* there may be empty slots corresponding to non-baserel RTEs */
|
|
|
|
if (rel == NULL)
|
|
|
|
continue;
|
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
Assert(rel->relid == rti); /* sanity check on array */
|
2005-06-06 06:13:36 +02:00
|
|
|
|
|
|
|
/* ignore RTEs that are "other rels" */
|
|
|
|
if (rel->reloptkind != RELOPT_BASEREL)
|
|
|
|
continue;
|
|
|
|
|
2007-04-21 23:01:45 +02:00
|
|
|
set_rel_pathlist(root, rel, rti, root->simple_rte_array[rti]);
|
2006-02-03 22:08:49 +01:00
|
|
|
}
|
|
|
|
}
|
2000-02-07 05:41:04 +01:00
|
|
|
|
2006-02-03 22:08:49 +01:00
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* set_rel_size
|
|
|
|
* Set size estimates for a base relation
|
2006-02-03 22:08:49 +01:00
|
|
|
*/
|
|
|
|
static void
|
2012-01-28 01:26:38 +01:00
|
|
|
set_rel_size(PlannerInfo *root, RelOptInfo *rel,
|
2012-06-10 21:20:04 +02:00
|
|
|
Index rti, RangeTblEntry *rte)
|
2006-02-03 22:08:49 +01:00
|
|
|
{
|
2011-09-25 01:33:16 +02:00
|
|
|
if (rel->reloptkind == RELOPT_BASEREL &&
|
|
|
|
relation_excluded_by_constraints(root, rel, rte))
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We proved we don't need to scan the rel via constraint exclusion,
|
|
|
|
* so set up a single dummy path for it. Here we only check this for
|
|
|
|
* regular baserels; if it's an otherrel, CE was already checked in
|
2015-04-24 21:18:07 +02:00
|
|
|
* set_append_rel_size().
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
|
|
|
* In this case, we go ahead and set up the relation's path right away
|
|
|
|
* instead of leaving it for set_rel_pathlist to do. This is because
|
|
|
|
* we don't have a convention for marking a rel as dummy except by
|
|
|
|
* assigning a dummy path to it.
|
2011-09-25 01:33:16 +02:00
|
|
|
*/
|
|
|
|
set_dummy_rel_pathlist(rel);
|
|
|
|
}
|
|
|
|
else if (rte->inh)
|
2006-02-03 22:08:49 +01:00
|
|
|
{
|
|
|
|
/* It's an "append relation", process accordingly */
|
2012-01-28 01:26:38 +01:00
|
|
|
set_append_rel_size(root, rel, rti, rte);
|
2006-02-03 22:08:49 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2011-02-23 01:23:23 +01:00
|
|
|
switch (rel->rtekind)
|
2011-02-20 06:17:18 +01:00
|
|
|
{
|
2011-02-23 01:23:23 +01:00
|
|
|
case RTE_RELATION:
|
|
|
|
if (rte->relkind == RELKIND_FOREIGN_TABLE)
|
|
|
|
{
|
|
|
|
/* Foreign table */
|
2012-01-28 01:26:38 +01:00
|
|
|
set_foreign_size(root, rel, rte);
|
2011-02-23 01:23:23 +01:00
|
|
|
}
|
2017-03-21 14:48:04 +01:00
|
|
|
else if (rte->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
{
|
|
|
|
/*
|
2019-03-30 23:58:55 +01:00
|
|
|
* We could get here if asked to scan a partitioned table
|
|
|
|
* with ONLY. In that case we shouldn't scan any of the
|
|
|
|
* partitions, so mark it as a dummy rel.
|
2017-03-21 14:48:04 +01:00
|
|
|
*/
|
|
|
|
set_dummy_rel_pathlist(rel);
|
|
|
|
}
|
2015-05-15 20:37:10 +02:00
|
|
|
else if (rte->tablesample != NULL)
|
|
|
|
{
|
|
|
|
/* Sampled relation */
|
|
|
|
set_tablesample_rel_size(root, rel, rte);
|
|
|
|
}
|
2011-02-23 01:23:23 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Plain relation */
|
2012-01-28 01:26:38 +01:00
|
|
|
set_plain_rel_size(root, rel, rte);
|
2011-02-23 01:23:23 +01:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
case RTE_SUBQUERY:
|
2012-06-10 21:20:04 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
2012-08-08 01:02:54 +02:00
|
|
|
* Subqueries don't support making a choice between
|
|
|
|
* parameterized and unparameterized paths, so just go ahead
|
|
|
|
* and build their paths immediately.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
2011-02-23 01:23:23 +01:00
|
|
|
set_subquery_pathlist(root, rel, rti, rte);
|
|
|
|
break;
|
|
|
|
case RTE_FUNCTION:
|
2012-01-28 01:26:38 +01:00
|
|
|
set_function_size_estimates(root, rel);
|
2011-02-23 01:23:23 +01:00
|
|
|
break;
|
2017-03-08 16:39:37 +01:00
|
|
|
case RTE_TABLEFUNC:
|
|
|
|
set_tablefunc_size_estimates(root, rel);
|
|
|
|
break;
|
2011-02-23 01:23:23 +01:00
|
|
|
case RTE_VALUES:
|
2012-01-28 01:26:38 +01:00
|
|
|
set_values_size_estimates(root, rel);
|
2011-02-23 01:23:23 +01:00
|
|
|
break;
|
|
|
|
case RTE_CTE:
|
2012-06-10 21:20:04 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
2012-08-27 04:48:55 +02:00
|
|
|
* CTEs don't support making a choice between parameterized
|
|
|
|
* and unparameterized paths, so just go ahead and build their
|
|
|
|
* paths immediately.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
2011-02-23 01:23:23 +01:00
|
|
|
if (rte->self_reference)
|
|
|
|
set_worktable_pathlist(root, rel, rte);
|
|
|
|
else
|
|
|
|
set_cte_pathlist(root, rel, rte);
|
|
|
|
break;
|
2017-04-01 06:17:18 +02:00
|
|
|
case RTE_NAMEDTUPLESTORE:
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
/* Might as well just build the path immediately */
|
2017-04-01 06:17:18 +02:00
|
|
|
set_namedtuplestore_pathlist(root, rel, rte);
|
|
|
|
break;
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
case RTE_RESULT:
|
|
|
|
/* Might as well just build the path immediately */
|
|
|
|
set_result_pathlist(root, rel, rte);
|
|
|
|
break;
|
2011-02-23 01:23:23 +01:00
|
|
|
default:
|
|
|
|
elog(ERROR, "unexpected rtekind: %d", (int) rel->rtekind);
|
|
|
|
break;
|
2011-02-20 06:17:18 +01:00
|
|
|
}
|
2006-02-03 22:08:49 +01:00
|
|
|
}
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We insist that all non-dummy rels have a nonzero rowcount estimate.
|
|
|
|
*/
|
|
|
|
Assert(rel->rows > 0 || IS_DUMMY_REL(rel));
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* set_rel_pathlist
|
|
|
|
* Build access paths for a base relation
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
Index rti, RangeTblEntry *rte)
|
|
|
|
{
|
|
|
|
if (IS_DUMMY_REL(rel))
|
|
|
|
{
|
|
|
|
/* We already proved the relation empty, so nothing more to do */
|
|
|
|
}
|
|
|
|
else if (rte->inh)
|
|
|
|
{
|
|
|
|
/* It's an "append relation", process accordingly */
|
|
|
|
set_append_rel_pathlist(root, rel, rti, rte);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
switch (rel->rtekind)
|
|
|
|
{
|
|
|
|
case RTE_RELATION:
|
|
|
|
if (rte->relkind == RELKIND_FOREIGN_TABLE)
|
|
|
|
{
|
|
|
|
/* Foreign table */
|
|
|
|
set_foreign_pathlist(root, rel, rte);
|
|
|
|
}
|
2015-05-15 20:37:10 +02:00
|
|
|
else if (rte->tablesample != NULL)
|
|
|
|
{
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
/* Sampled relation */
|
2015-05-15 20:37:10 +02:00
|
|
|
set_tablesample_rel_pathlist(root, rel, rte);
|
|
|
|
}
|
2012-01-28 01:26:38 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Plain relation */
|
|
|
|
set_plain_rel_pathlist(root, rel, rte);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case RTE_SUBQUERY:
|
|
|
|
/* Subquery --- fully handled during set_rel_size */
|
|
|
|
break;
|
|
|
|
case RTE_FUNCTION:
|
|
|
|
/* RangeFunction */
|
|
|
|
set_function_pathlist(root, rel, rte);
|
|
|
|
break;
|
2017-03-08 16:39:37 +01:00
|
|
|
case RTE_TABLEFUNC:
|
|
|
|
/* Table Function */
|
|
|
|
set_tablefunc_pathlist(root, rel, rte);
|
|
|
|
break;
|
2012-01-28 01:26:38 +01:00
|
|
|
case RTE_VALUES:
|
|
|
|
/* Values list */
|
|
|
|
set_values_pathlist(root, rel, rte);
|
|
|
|
break;
|
|
|
|
case RTE_CTE:
|
|
|
|
/* CTE reference --- fully handled during set_rel_size */
|
|
|
|
break;
|
2017-04-01 06:17:18 +02:00
|
|
|
case RTE_NAMEDTUPLESTORE:
|
|
|
|
/* tuplestore reference --- fully handled during set_rel_size */
|
|
|
|
break;
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
case RTE_RESULT:
|
|
|
|
/* simple Result --- fully handled during set_rel_size */
|
|
|
|
break;
|
2012-01-28 01:26:38 +01:00
|
|
|
default:
|
|
|
|
elog(ERROR, "unexpected rtekind: %d", (int) rel->rtekind);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2001-10-18 18:11:42 +02:00
|
|
|
|
Call set_rel_pathlist_hook before generate_gather_paths, not after.
The previous ordering of these steps satisfied the nominal requirement
that set_rel_pathlist_hook could editorialize on the whole set of Paths
constructed for a base relation. In practice, though, trying to change
the set of partial paths was impossible. Adding one didn't work because
(a) it was too late to be included in Gather paths made by the core code,
and (b) calling add_partial_path after generate_gather_paths is unsafe,
because it might try to delete a path it thinks is dominated, but that
is already embedded in some Gather path(s). Nor could the hook safely
remove partial paths, for the same reason that they might already be
embedded in Gathers.
Better to call extensions first, let them add partial paths as desired,
and then gather. In v11 and up, we already doubled down on that ordering
by postponing gathering even further for single-relation queries; so even
if the hook wished to editorialize on Gather path construction, it could
not.
Report and patch by KaiGai Kohei. Back-patch to 9.6 where Gather paths
were added.
Discussion: https://postgr.es/m/CAOP8fzahwpKJRTVVTqo2AE=mDTz_efVzV6Get_0=U3SO+-ha1A@mail.gmail.com
2019-02-09 17:41:09 +01:00
|
|
|
/*
|
|
|
|
* Allow a plugin to editorialize on the set of Paths for this base
|
|
|
|
* relation. It could add new paths (such as CustomPaths) by calling
|
|
|
|
* add_path(), or add_partial_path() if parallel aware. It could also
|
|
|
|
* delete or modify paths added by the core code.
|
|
|
|
*/
|
|
|
|
if (set_rel_pathlist_hook)
|
|
|
|
(*set_rel_pathlist_hook) (root, rel, rti, rte);
|
|
|
|
|
2016-04-30 18:29:21 +02:00
|
|
|
/*
|
2018-03-12 21:45:15 +01:00
|
|
|
* If this is a baserel, we should normally consider gathering any partial
|
Call set_rel_pathlist_hook before generate_gather_paths, not after.
The previous ordering of these steps satisfied the nominal requirement
that set_rel_pathlist_hook could editorialize on the whole set of Paths
constructed for a base relation. In practice, though, trying to change
the set of partial paths was impossible. Adding one didn't work because
(a) it was too late to be included in Gather paths made by the core code,
and (b) calling add_partial_path after generate_gather_paths is unsafe,
because it might try to delete a path it thinks is dominated, but that
is already embedded in some Gather path(s). Nor could the hook safely
remove partial paths, for the same reason that they might already be
embedded in Gathers.
Better to call extensions first, let them add partial paths as desired,
and then gather. In v11 and up, we already doubled down on that ordering
by postponing gathering even further for single-relation queries; so even
if the hook wished to editorialize on Gather path construction, it could
not.
Report and patch by KaiGai Kohei. Back-patch to 9.6 where Gather paths
were added.
Discussion: https://postgr.es/m/CAOP8fzahwpKJRTVVTqo2AE=mDTz_efVzV6Get_0=U3SO+-ha1A@mail.gmail.com
2019-02-09 17:41:09 +01:00
|
|
|
* paths we may have created for it. We have to do this after calling the
|
|
|
|
* set_rel_pathlist_hook, else it cannot add partial paths to be included
|
|
|
|
* here.
|
2018-03-12 21:45:15 +01:00
|
|
|
*
|
|
|
|
* However, if this is an inheritance child, skip it. Otherwise, we could
|
2016-04-30 18:29:21 +02:00
|
|
|
* end up with a very large number of gather nodes, each trying to grab
|
2018-03-12 21:45:15 +01:00
|
|
|
* its own pool of workers. Instead, we'll consider gathering partial
|
|
|
|
* paths for the parent appendrel.
|
|
|
|
*
|
|
|
|
* Also, if this is the topmost scan/join rel (that is, the only baserel),
|
Call set_rel_pathlist_hook before generate_gather_paths, not after.
The previous ordering of these steps satisfied the nominal requirement
that set_rel_pathlist_hook could editorialize on the whole set of Paths
constructed for a base relation. In practice, though, trying to change
the set of partial paths was impossible. Adding one didn't work because
(a) it was too late to be included in Gather paths made by the core code,
and (b) calling add_partial_path after generate_gather_paths is unsafe,
because it might try to delete a path it thinks is dominated, but that
is already embedded in some Gather path(s). Nor could the hook safely
remove partial paths, for the same reason that they might already be
embedded in Gathers.
Better to call extensions first, let them add partial paths as desired,
and then gather. In v11 and up, we already doubled down on that ordering
by postponing gathering even further for single-relation queries; so even
if the hook wished to editorialize on Gather path construction, it could
not.
Report and patch by KaiGai Kohei. Back-patch to 9.6 where Gather paths
were added.
Discussion: https://postgr.es/m/CAOP8fzahwpKJRTVVTqo2AE=mDTz_efVzV6Get_0=U3SO+-ha1A@mail.gmail.com
2019-02-09 17:41:09 +01:00
|
|
|
* we postpone gathering until the final scan/join targetlist is available
|
|
|
|
* (see grouping_planner).
|
2016-04-30 18:29:21 +02:00
|
|
|
*/
|
2018-03-12 21:45:15 +01:00
|
|
|
if (rel->reloptkind == RELOPT_BASEREL &&
|
|
|
|
bms_membership(root->all_baserels) != BMS_SINGLETON)
|
2020-04-07 16:43:18 +02:00
|
|
|
generate_useful_gather_paths(root, rel, false);
|
2016-04-30 18:29:21 +02:00
|
|
|
|
2014-11-21 20:05:46 +01:00
|
|
|
/* Now find the cheapest of the paths for this rel */
|
|
|
|
set_cheapest(rel);
|
|
|
|
|
2001-10-18 18:11:42 +02:00
|
|
|
#ifdef OPTIMIZER_DEBUG
|
2006-02-03 22:08:49 +01:00
|
|
|
debug_print_rel(root, rel);
|
2001-10-18 18:11:42 +02:00
|
|
|
#endif
|
2000-11-12 01:37:02 +01:00
|
|
|
}
|
2000-09-29 20:21:41 +02:00
|
|
|
|
2000-11-12 01:37:02 +01:00
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* set_plain_rel_size
|
|
|
|
* Set size estimates for a plain relation (no subquery, no inheritance)
|
2000-11-12 01:37:02 +01:00
|
|
|
*/
|
|
|
|
static void
|
2012-01-28 01:26:38 +01:00
|
|
|
set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
2000-11-12 01:37:02 +01:00
|
|
|
{
|
2009-02-15 21:16:21 +01:00
|
|
|
/*
|
|
|
|
* Test any partial indexes of rel for applicability. We must do this
|
|
|
|
* first since partial unique indexes can affect size estimates.
|
|
|
|
*/
|
Support using index-only scans with partial indexes in more cases.
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate. This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all. So index-only scans were
essentially unavailable with such partial indexes.
To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner. This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes. However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes. We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed. We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.
Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me
2016-03-31 20:48:56 +02:00
|
|
|
check_index_predicates(root, rel);
|
2009-02-15 21:16:21 +01:00
|
|
|
|
2007-04-21 08:18:52 +02:00
|
|
|
/* Mark rel with estimated output rows, width, etc */
|
|
|
|
set_baserel_size_estimates(root, rel);
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
2007-04-21 08:18:52 +02:00
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
/*
|
|
|
|
* If this relation could possibly be scanned from within a worker, then set
|
2016-07-03 23:57:28 +02:00
|
|
|
* its consider_parallel flag.
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
RangeTblEntry *rte)
|
|
|
|
{
|
2016-07-03 23:57:28 +02:00
|
|
|
/*
|
|
|
|
* The flag has previously been initialized to false, so we can just
|
|
|
|
* return if it becomes clear that we can't safely set it.
|
|
|
|
*/
|
|
|
|
Assert(!rel->consider_parallel);
|
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
/* Don't call this if parallelism is disallowed for the entire query. */
|
|
|
|
Assert(root->glob->parallelModeOK);
|
|
|
|
|
2016-07-03 23:57:28 +02:00
|
|
|
/* This should only be called for baserels and appendrel children. */
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
Assert(IS_SIMPLE_REL(rel));
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
|
|
|
|
/* Assorted checks based on rtekind. */
|
|
|
|
switch (rte->rtekind)
|
|
|
|
{
|
|
|
|
case RTE_RELATION:
|
2016-06-10 00:02:36 +02:00
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
/*
|
|
|
|
* Currently, parallel workers can't access the leader's temporary
|
2020-02-06 11:27:02 +01:00
|
|
|
* tables. We could possibly relax this if we wrote all of its
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
* local buffers at the start of the query and made no changes
|
|
|
|
* thereafter (maybe we could allow hint bit changes), and if we
|
|
|
|
* taught the workers to read them. Writing a large number of
|
|
|
|
* temporary buffers could be expensive, though, and we don't have
|
|
|
|
* the rest of the necessary infrastructure right now anyway. So
|
|
|
|
* for now, bail out if we see a temporary table.
|
|
|
|
*/
|
|
|
|
if (get_rel_persistence(rte->relid) == RELPERSISTENCE_TEMP)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Table sampling can be pushed down to workers if the sample
|
|
|
|
* function and its arguments are safe.
|
|
|
|
*/
|
|
|
|
if (rte->tablesample != NULL)
|
|
|
|
{
|
2016-11-29 17:07:02 +01:00
|
|
|
char proparallel = func_parallel(rte->tablesample->tsmhandler);
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
|
|
|
|
if (proparallel != PROPARALLEL_SAFE)
|
|
|
|
return;
|
2016-08-19 20:03:07 +02:00
|
|
|
if (!is_parallel_safe(root, (Node *) rte->tablesample->args))
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
return;
|
|
|
|
}
|
2016-02-26 11:44:46 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Ask FDWs whether they can support performing a ForeignScan
|
|
|
|
* within a worker. Most often, the answer will be no. For
|
|
|
|
* example, if the nature of the FDW is such that it opens a TCP
|
|
|
|
* connection with a remote server, each parallel worker would end
|
|
|
|
* up with a separate connection, and these connections might not
|
|
|
|
* be appropriately coordinated between workers and the leader.
|
|
|
|
*/
|
|
|
|
if (rte->relkind == RELKIND_FOREIGN_TABLE)
|
|
|
|
{
|
|
|
|
Assert(rel->fdwroutine);
|
|
|
|
if (!rel->fdwroutine->IsForeignScanParallelSafe)
|
|
|
|
return;
|
|
|
|
if (!rel->fdwroutine->IsForeignScanParallelSafe(root, rel, rte))
|
|
|
|
return;
|
|
|
|
}
|
2016-07-03 23:57:28 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* There are additional considerations for appendrels, which we'll
|
|
|
|
* deal with in set_append_rel_size and set_append_rel_pathlist.
|
|
|
|
* For now, just set consider_parallel based on the rel's own
|
|
|
|
* quals and targetlist.
|
|
|
|
*/
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
break;
|
|
|
|
|
|
|
|
case RTE_SUBQUERY:
|
2016-06-10 00:02:36 +02:00
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
/*
|
2016-07-04 00:24:49 +02:00
|
|
|
* There's no intrinsic problem with scanning a subquery-in-FROM
|
|
|
|
* (as distinct from a SubPlan or InitPlan) in a parallel worker.
|
|
|
|
* If the subquery doesn't happen to have any parallel-safe paths,
|
|
|
|
* then flagging it as consider_parallel won't change anything,
|
|
|
|
* but that's true for plain tables, too. We must set
|
|
|
|
* consider_parallel based on the rel's own quals and targetlist,
|
|
|
|
* so that if a subquery path is parallel-safe but the quals and
|
|
|
|
* projection we're sticking onto it are not, we correctly mark
|
|
|
|
* the SubqueryScanPath as not parallel-safe. (Note that
|
|
|
|
* set_subquery_pathlist() might push some of these quals down
|
|
|
|
* into the subquery itself, but that doesn't change anything.)
|
2018-09-14 06:06:30 +02:00
|
|
|
*
|
|
|
|
* We can't push sub-select containing LIMIT/OFFSET to workers as
|
|
|
|
* there is no guarantee that the row order will be fully
|
|
|
|
* deterministic, and applying LIMIT/OFFSET will lead to
|
|
|
|
* inconsistent results at the top-level. (In some cases, where
|
|
|
|
* the result is ordered, we could relax this restriction. But it
|
|
|
|
* doesn't currently seem worth expending extra effort to do so.)
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
*/
|
2018-09-14 06:06:30 +02:00
|
|
|
{
|
|
|
|
Query *subquery = castNode(Query, rte->subquery);
|
|
|
|
|
|
|
|
if (limit_needed(subquery))
|
|
|
|
return;
|
|
|
|
}
|
2016-07-04 00:24:49 +02:00
|
|
|
break;
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
|
|
|
|
case RTE_JOIN:
|
|
|
|
/* Shouldn't happen; we're only considering baserels here. */
|
|
|
|
Assert(false);
|
|
|
|
return;
|
|
|
|
|
|
|
|
case RTE_FUNCTION:
|
|
|
|
/* Check for parallel-restricted functions. */
|
2016-08-19 20:03:07 +02:00
|
|
|
if (!is_parallel_safe(root, (Node *) rte->functions))
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
return;
|
|
|
|
break;
|
|
|
|
|
2017-03-08 16:39:37 +01:00
|
|
|
case RTE_TABLEFUNC:
|
|
|
|
/* not parallel safe */
|
|
|
|
return;
|
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
case RTE_VALUES:
|
2016-08-19 20:35:32 +02:00
|
|
|
/* Check for parallel-restricted functions. */
|
|
|
|
if (!is_parallel_safe(root, (Node *) rte->values_lists))
|
|
|
|
return;
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
break;
|
|
|
|
|
|
|
|
case RTE_CTE:
|
2016-06-10 00:02:36 +02:00
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
/*
|
|
|
|
* CTE tuplestores aren't shared among parallel workers, so we
|
|
|
|
* force all CTE scans to happen in the leader. Also, populating
|
|
|
|
* the CTE would require executing a subplan that's not available
|
|
|
|
* in the worker, might be parallel-restricted, and must get
|
|
|
|
* executed only once.
|
|
|
|
*/
|
|
|
|
return;
|
2017-04-01 06:17:18 +02:00
|
|
|
|
|
|
|
case RTE_NAMEDTUPLESTORE:
|
2017-05-17 22:31:56 +02:00
|
|
|
|
2017-04-01 06:17:18 +02:00
|
|
|
/*
|
|
|
|
* tuplestore cannot be shared, at least without more
|
|
|
|
* infrastructure to support that.
|
|
|
|
*/
|
|
|
|
return;
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
|
|
|
|
case RTE_RESULT:
|
|
|
|
/* RESULT RTEs, in themselves, are no problem. */
|
|
|
|
break;
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2016-06-10 00:02:36 +02:00
|
|
|
* If there's anything in baserestrictinfo that's parallel-restricted, we
|
|
|
|
* give up on parallelizing access to this relation. We could consider
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
* instead postponing application of the restricted quals until we're
|
|
|
|
* above all the parallelism in the plan tree, but it's not clear that
|
2016-07-03 23:57:28 +02:00
|
|
|
* that would be a win in very many cases, and it might be tricky to make
|
|
|
|
* outer join clauses work correctly. It would likely break equivalence
|
|
|
|
* classes, too.
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
*/
|
2016-08-19 20:03:07 +02:00
|
|
|
if (!is_parallel_safe(root, (Node *) rel->baserestrictinfo))
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
return;
|
|
|
|
|
2016-06-09 18:40:23 +02:00
|
|
|
/*
|
2016-06-10 22:20:03 +02:00
|
|
|
* Likewise, if the relation's outputs are not parallel-safe, give up.
|
|
|
|
* (Usually, they're just Vars, but sometimes they're not.)
|
2016-06-09 18:40:23 +02:00
|
|
|
*/
|
2016-08-19 20:03:07 +02:00
|
|
|
if (!is_parallel_safe(root, (Node *) rel->reltarget->exprs))
|
2016-06-09 18:40:23 +02:00
|
|
|
return;
|
|
|
|
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
/* We have a winner. */
|
|
|
|
rel->consider_parallel = true;
|
|
|
|
}
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* set_plain_rel_pathlist
|
|
|
|
* Build access paths for a plain relation (no subquery, no inheritance)
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
2012-08-27 04:48:55 +02:00
|
|
|
Relids required_outer;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't support pushing join clauses into the quals of a seqscan, but
|
|
|
|
* it could still have required parameterization due to LATERAL refs in
|
2013-08-18 02:22:37 +02:00
|
|
|
* its tlist.
|
2012-08-27 04:48:55 +02:00
|
|
|
*/
|
|
|
|
required_outer = rel->lateral_relids;
|
|
|
|
|
2000-11-12 01:37:02 +01:00
|
|
|
/* Consider sequential scan */
|
2015-11-11 14:57:52 +01:00
|
|
|
add_path(rel, create_seqscan_path(root, rel, required_outer, 0));
|
2000-09-29 20:21:41 +02:00
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/* If appropriate, consider parallel sequential scan */
|
|
|
|
if (rel->consider_parallel && required_outer == NULL)
|
2016-04-30 18:29:21 +02:00
|
|
|
create_plain_partial_paths(root, rel);
|
Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 15:02:52 +01:00
|
|
|
|
2005-04-25 03:30:14 +02:00
|
|
|
/* Consider index scans */
|
|
|
|
create_index_paths(root, rel);
|
|
|
|
|
2000-11-12 01:37:02 +01:00
|
|
|
/* Consider TID scans */
|
|
|
|
create_tidscan_paths(root, rel);
|
|
|
|
}
|
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/*
|
2016-04-30 18:29:21 +02:00
|
|
|
* create_plain_partial_paths
|
|
|
|
* Build partial access paths for parallel scan of a plain relation
|
2016-01-20 20:29:22 +01:00
|
|
|
*/
|
|
|
|
static void
|
2016-04-30 18:29:21 +02:00
|
|
|
create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
|
2016-01-20 20:29:22 +01:00
|
|
|
{
|
2016-06-09 17:16:26 +02:00
|
|
|
int parallel_workers;
|
2016-01-20 20:29:22 +01:00
|
|
|
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
parallel_workers = compute_parallel_worker(rel, rel->pages, -1,
|
|
|
|
max_parallel_workers_per_gather);
|
2016-06-09 17:16:26 +02:00
|
|
|
|
|
|
|
/* If any limit was set to zero, the user doesn't want a parallel scan. */
|
|
|
|
if (parallel_workers <= 0)
|
|
|
|
return;
|
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/* Add an unordered partial path based on a parallel sequential scan. */
|
2016-06-09 15:08:27 +02:00
|
|
|
add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_workers));
|
2016-01-20 20:29:22 +01:00
|
|
|
}
|
|
|
|
|
2015-05-15 20:37:10 +02:00
|
|
|
/*
|
|
|
|
* set_tablesample_rel_size
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
* Set size estimates for a sampled relation
|
2015-05-15 20:37:10 +02:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
TableSampleClause *tsc = rte->tablesample;
|
|
|
|
TsmRoutine *tsm;
|
|
|
|
BlockNumber pages;
|
|
|
|
double tuples;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Test any partial indexes of rel for applicability. We must do this
|
|
|
|
* first since partial unique indexes can affect size estimates.
|
|
|
|
*/
|
Support using index-only scans with partial indexes in more cases.
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate. This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all. So index-only scans were
essentially unavailable with such partial indexes.
To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner. This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes. However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes. We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed. We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.
Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me
2016-03-31 20:48:56 +02:00
|
|
|
check_index_predicates(root, rel);
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Call the sampling method's estimation function to estimate the number
|
|
|
|
* of pages it will read and the number of tuples it will return. (Note:
|
|
|
|
* we assume the function returns sane values.)
|
|
|
|
*/
|
|
|
|
tsm = GetTsmRoutine(tsc->tsmhandler);
|
|
|
|
tsm->SampleScanGetSampleSize(root, rel, tsc->args,
|
|
|
|
&pages, &tuples);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For the moment, because we will only consider a SampleScan path for the
|
|
|
|
* rel, it's okay to just overwrite the pages and tuples estimates for the
|
|
|
|
* whole relation. If we ever consider multiple path types for sampled
|
|
|
|
* rels, we'll need more complication.
|
|
|
|
*/
|
|
|
|
rel->pages = pages;
|
|
|
|
rel->tuples = tuples;
|
|
|
|
|
2015-05-15 20:37:10 +02:00
|
|
|
/* Mark rel with estimated output rows, width, etc */
|
|
|
|
set_baserel_size_estimates(root, rel);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* set_tablesample_rel_pathlist
|
|
|
|
* Build access paths for a sampled relation
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
2015-05-24 03:35:49 +02:00
|
|
|
Relids required_outer;
|
|
|
|
Path *path;
|
2015-05-15 20:37:10 +02:00
|
|
|
|
|
|
|
/*
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
* We don't support pushing join clauses into the quals of a samplescan,
|
|
|
|
* but it could still have required parameterization due to LATERAL refs
|
|
|
|
* in its tlist or TABLESAMPLE arguments.
|
2015-05-15 20:37:10 +02:00
|
|
|
*/
|
|
|
|
required_outer = rel->lateral_relids;
|
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
/* Consider sampled scan */
|
2015-05-15 20:37:10 +02:00
|
|
|
path = create_samplescan_path(root, rel, required_outer);
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the sampling method does not support repeatable scans, we must avoid
|
|
|
|
* plans that would scan the rel multiple times. Ideally, we'd simply
|
|
|
|
* avoid putting the rel on the inside of a nestloop join; but adding such
|
|
|
|
* a consideration to the planner seems like a great deal of complication
|
|
|
|
* to support an uncommon usage of second-rate sampling methods. Instead,
|
|
|
|
* if there is a risk that the query might perform an unsafe join, just
|
|
|
|
* wrap the SampleScan in a Materialize node. We can check for joins by
|
|
|
|
* counting the membership of all_baserels (note that this correctly
|
|
|
|
* counts inheritance trees as single rels). If we're inside a subquery,
|
|
|
|
* we can't easily check whether a join might occur in the outer query, so
|
|
|
|
* just assume one is possible.
|
|
|
|
*
|
|
|
|
* GetTsmRoutine is relatively expensive compared to the other tests here,
|
|
|
|
* so check repeatable_across_scans last, even though that's a bit odd.
|
|
|
|
*/
|
|
|
|
if ((root->query_level > 1 ||
|
|
|
|
bms_membership(root->all_baserels) != BMS_SINGLETON) &&
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
!(GetTsmRoutine(rte->tablesample->tsmhandler)->repeatable_across_scans))
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
{
|
|
|
|
path = (Path *) create_material_path(rel, path);
|
|
|
|
}
|
|
|
|
|
|
|
|
add_path(rel, path);
|
|
|
|
|
|
|
|
/* For the moment, at least, there are no other paths to consider */
|
2015-05-15 20:37:10 +02:00
|
|
|
}
|
|
|
|
|
2000-11-12 01:37:02 +01:00
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* set_foreign_size
|
|
|
|
* Set size estimates for a foreign table RTE
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
|
|
|
/* Mark rel with estimated output rows, width, etc */
|
|
|
|
set_foreign_size_estimates(root, rel);
|
Revise FDW planning API, again.
Further reflection shows that a single callback isn't very workable if we
desire to let FDWs generate multiple Paths, because that forces the FDW to
do all work necessary to generate a valid Plan node for each Path. Instead
split the former PlanForeignScan API into three steps: GetForeignRelSize,
GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking
the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain,
and it's substantially more flexible for complex FDWs.
Add an fdw_private field to RelOptInfo so that the new functions can save
state there rather than possibly having to recalculate information two or
three times.
In addition, we'd not thought through what would be needed to allow an FDW
to set up subexpressions of its choice for runtime execution. We could
treat ForeignScan.fdw_private as an executable expression but that seems
likely to break existing FDWs unnecessarily (in particular, it would
restrict the set of node types allowable in fdw_private to those supported
by expression_tree_walker). Instead, invent a separate field fdw_exprs
which will receive the postprocessing appropriate for expression trees.
(One field is enough since it can be a list of expressions; also, we assume
the corresponding expression state tree(s) will be held within fdw_state,
so we don't need to add anything to ForeignScanState.)
Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this
further as we continue to work on that patch, but to me it feels a lot
closer to being right now.
2012-03-09 18:48:48 +01:00
|
|
|
|
|
|
|
/* Let FDW adjust the size estimates, if it can */
|
|
|
|
rel->fdwroutine->GetForeignRelSize(root, rel, rte->relid);
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
|
|
|
|
/* ... but do not let it set the rows estimate to zero */
|
|
|
|
rel->rows = clamp_row_est(rel->rows);
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* set_foreign_pathlist
|
2012-03-05 22:15:59 +01:00
|
|
|
* Build access paths for a foreign table RTE
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
Revise FDW planning API, again.
Further reflection shows that a single callback isn't very workable if we
desire to let FDWs generate multiple Paths, because that forces the FDW to
do all work necessary to generate a valid Plan node for each Path. Instead
split the former PlanForeignScan API into three steps: GetForeignRelSize,
GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking
the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain,
and it's substantially more flexible for complex FDWs.
Add an fdw_private field to RelOptInfo so that the new functions can save
state there rather than possibly having to recalculate information two or
three times.
In addition, we'd not thought through what would be needed to allow an FDW
to set up subexpressions of its choice for runtime execution. We could
treat ForeignScan.fdw_private as an executable expression but that seems
likely to break existing FDWs unnecessarily (in particular, it would
restrict the set of node types allowable in fdw_private to those supported
by expression_tree_walker). Instead, invent a separate field fdw_exprs
which will receive the postprocessing appropriate for expression trees.
(One field is enough since it can be a list of expressions; also, we assume
the corresponding expression state tree(s) will be held within fdw_state,
so we don't need to add anything to ForeignScanState.)
Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this
further as we continue to work on that patch, but to me it feels a lot
closer to being right now.
2012-03-09 18:48:48 +01:00
|
|
|
/* Call the FDW's GetForeignPaths function to generate path(s) */
|
|
|
|
rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* set_append_rel_size
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
* Set size estimates for a simple "append relation"
|
2000-11-12 01:37:02 +01:00
|
|
|
*
|
2014-05-06 18:12:18 +02:00
|
|
|
* The passed-in rel and RTE represent the entire append relation. The
|
2017-06-22 16:52:25 +02:00
|
|
|
* relation's contents are computed by appending together the output of the
|
|
|
|
* individual member relations. Note that in the non-partitioned inheritance
|
|
|
|
* case, the first member relation is actually the same table as is mentioned
|
|
|
|
* in the parent RTE ... but it has a different RTE and RelOptInfo. This is
|
2006-01-31 22:39:25 +01:00
|
|
|
* a good thing because their outputs are not the same size.
|
2000-11-12 01:37:02 +01:00
|
|
|
*/
|
|
|
|
static void
|
2012-01-28 01:26:38 +01:00
|
|
|
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
Index rti, RangeTblEntry *rte)
|
2000-11-12 01:37:02 +01:00
|
|
|
{
|
2001-05-20 22:28:20 +02:00
|
|
|
int parentRTindex = rti;
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
bool has_live_children;
|
2008-06-27 05:56:55 +02:00
|
|
|
double parent_rows;
|
|
|
|
double parent_size;
|
|
|
|
double *parent_attrsizes;
|
|
|
|
int nattrs;
|
2006-01-31 22:39:25 +01:00
|
|
|
ListCell *l;
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2017-09-14 21:41:08 +02:00
|
|
|
/* Guard against stack overflow due to overly deep inheritance tree. */
|
|
|
|
check_stack_depth();
|
|
|
|
|
Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-04 04:41:31 +02:00
|
|
|
Assert(IS_SIMPLE_REL(rel));
|
|
|
|
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
/*
|
|
|
|
* Initialize partitioned_child_rels to contain this RT index.
|
|
|
|
*
|
|
|
|
* Note that during the set_append_rel_pathlist() phase, we will bubble up
|
|
|
|
* the indexes of partitioned relations that appear down in the tree, so
|
|
|
|
* that when we've created Paths for all the children, the root
|
|
|
|
* partitioned table's list will contain all such indexes.
|
|
|
|
*/
|
|
|
|
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
rel->partitioned_child_rels = list_make1_int(rti);
|
|
|
|
|
Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the
child's tlist by applying adjust_appendrel_attrs() to the parent's. So in
the case where the parent's included a whole-row Var for the parent, the
child's contained a ConvertRowtypeExpr. To cope with that, that commit
added code to the planner, such as setrefs.c, but some code paths still
assumed that the tlist for a scan (or join) rel would only include Vars
and PlaceHolderVars, which was true before that commit, causing errors:
* When creating an explicit sort node for an input path for a mergejoin
path for a child join, prepare_sort_from_pathkeys() threw the 'could not
find pathkey item to sort' error.
* When deparsing a relation participating in a pushed down child join as a
subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw
the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan generated by
contrib/postgres_fdw for EvalPlanQual support for a pushed down child
join, fix_join_expr() threw the 'variable not found in subplan target
lists' error.
To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me. While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child not to violate the planner assumption, and tries to fix it up later,
But both approaches need more work, so refuse to generate partitionwise
join paths when whole-row Vars are involved, instead. We don't need to
handle ConvertRowtypeExprs in the child's tlists for now, so this commit
also removes the changes to the planner.
Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlist using that data, which also
required an extra step for adding PlaceHolderVars to that tlist, but it
would be more efficient to build it from the parent join's tlist through
the adjust_appendrel_attrs() transformation. So this commit builds that
list that way, and simplifies build_joinrel_tlist() and placeholder.c as
well as part of set_append_rel_size() to basically what they were before
partitionwise join went in.
Back-patch to PG11 where partitionwise join was introduced.
Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also
provided some of regression tests. Patch by me, reviewed by Robert Haas.
Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
2018-08-31 13:34:06 +02:00
|
|
|
/*
|
|
|
|
* If this is a partitioned baserel, set the consider_partitionwise_join
|
|
|
|
* flag; currently, we only consider partitionwise joins with the baserel
|
|
|
|
* if its targetlist doesn't contain a whole-row Var.
|
|
|
|
*/
|
|
|
|
if (enable_partitionwise_join &&
|
|
|
|
rel->reloptkind == RELOPT_BASEREL &&
|
|
|
|
rte->relkind == RELKIND_PARTITIONED_TABLE &&
|
|
|
|
rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL)
|
|
|
|
rel->consider_partitionwise_join = true;
|
|
|
|
|
2001-05-20 22:28:20 +02:00
|
|
|
/*
|
2008-06-27 05:56:55 +02:00
|
|
|
* Initialize to compute size estimates for whole append relation.
|
|
|
|
*
|
2009-06-11 16:49:15 +02:00
|
|
|
* We handle width estimates by weighting the widths of different child
|
|
|
|
* rels proportionally to their number of rows. This is sensible because
|
|
|
|
* the use of width estimates is mainly to compute the total relation
|
|
|
|
* "footprint" if we have to sort or hash it. To do this, we sum the
|
|
|
|
* total equivalent size (in "double" arithmetic) and then divide by the
|
|
|
|
* total rowcount estimate. This is done separately for the total rel
|
|
|
|
* width and each attribute.
|
2008-06-27 05:56:55 +02:00
|
|
|
*
|
|
|
|
* Note: if you consider changing this logic, beware that child rels could
|
|
|
|
* have zero rows and/or width, if they were excluded by constraints.
|
2000-11-12 01:37:02 +01:00
|
|
|
*/
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
has_live_children = false;
|
2008-06-27 05:56:55 +02:00
|
|
|
parent_rows = 0;
|
|
|
|
parent_size = 0;
|
|
|
|
nattrs = rel->max_attr - rel->min_attr + 1;
|
|
|
|
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
foreach(l, root->append_rel_list)
|
2000-11-12 01:37:02 +01:00
|
|
|
{
|
2006-01-31 22:39:25 +01:00
|
|
|
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
|
|
|
|
int childRTindex;
|
2007-04-21 23:01:45 +02:00
|
|
|
RangeTblEntry *childRTE;
|
2000-11-12 01:37:02 +01:00
|
|
|
RelOptInfo *childrel;
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *parentvars;
|
|
|
|
ListCell *childvars;
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
/* append_rel_list contains all append rels; ignore others */
|
|
|
|
if (appinfo->parent_relid != parentRTindex)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
childRTindex = appinfo->child_relid;
|
2007-04-21 23:01:45 +02:00
|
|
|
childRTE = root->simple_rte_array[childRTindex];
|
2000-11-12 01:37:02 +01:00
|
|
|
|
|
|
|
/*
|
2006-09-20 00:49:53 +02:00
|
|
|
* The child rel's RelOptInfo was already created during
|
Build "other rels" of appendrel baserels in a separate step.
Up to now, otherrel RelOptInfos were built at the same time as baserel
RelOptInfos, thanks to recursion in build_simple_rel(). However,
nothing in query_planner's preprocessing cares at all about otherrels,
only baserels, so we don't really need to build them until just before
we enter make_one_rel. This has two benefits:
* create_lateral_join_info did a lot of extra work to propagate
lateral-reference information from parents to the correct children.
But if we delay creation of the children till after that, it's
trivial (and much harder to break, too).
* Since we have all the restriction quals correctly assigned to
parent appendrels by this point, it'll be possible to do plan-time
pruning and never make child RelOptInfos at all for partitions that
can be pruned away. That's not done here, but will be later on.
Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen,
Yoshikazu Imai, and David Rowley
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-26 23:21:10 +01:00
|
|
|
* add_other_rels_to_query.
|
2000-11-12 01:37:02 +01:00
|
|
|
*/
|
2006-09-20 00:49:53 +02:00
|
|
|
childrel = find_base_rel(root, childRTindex);
|
|
|
|
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2019-03-30 23:58:55 +01:00
|
|
|
/* We may have already proven the child to be dummy. */
|
|
|
|
if (IS_DUMMY_REL(childrel))
|
2019-02-01 10:47:49 +01:00
|
|
|
continue;
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
|
|
|
|
/*
|
2019-02-01 10:47:49 +01:00
|
|
|
* We have to copy the parent's targetlist and quals to the child,
|
2019-03-30 23:58:55 +01:00
|
|
|
* with appropriate substitution of variables. However, the
|
|
|
|
* baserestrictinfo quals were already copied/substituted when the
|
|
|
|
* child RelOptInfo was built. So we don't need any additional setup
|
|
|
|
* before applying constraint exclusion.
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
*/
|
2008-04-01 02:48:33 +02:00
|
|
|
if (relation_excluded_by_constraints(root, childrel, childRTE))
|
2007-04-21 23:01:45 +02:00
|
|
|
{
|
2007-05-26 20:23:02 +02:00
|
|
|
/*
|
|
|
|
* This child need not be scanned, so we can omit it from the
|
2012-01-28 01:26:38 +01:00
|
|
|
* appendrel.
|
2007-05-26 20:23:02 +02:00
|
|
|
*/
|
|
|
|
set_dummy_rel_pathlist(childrel);
|
2007-04-21 23:01:45 +02:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2019-01-21 09:12:40 +01:00
|
|
|
/*
|
2019-03-30 23:58:55 +01:00
|
|
|
* Constraint exclusion failed, so copy the parent's join quals and
|
|
|
|
* targetlist to the child, with appropriate variable substitutions.
|
2019-01-21 09:12:40 +01:00
|
|
|
*
|
|
|
|
* NB: the resulting childrel->reltarget->exprs may contain arbitrary
|
|
|
|
* expressions, which otherwise would not occur in a rel's targetlist.
|
|
|
|
* Code that might be looking at an appendrel child must cope with
|
|
|
|
* such. (Normally, a rel's targetlist would only include Vars and
|
|
|
|
* PlaceHolderVars.) XXX we do not bother to update the cost or width
|
|
|
|
* fields of childrel->reltarget; not clear if that would be useful.
|
|
|
|
*/
|
2006-01-31 22:39:25 +01:00
|
|
|
childrel->joininfo = (List *)
|
2012-02-14 23:34:19 +01:00
|
|
|
adjust_appendrel_attrs(root,
|
|
|
|
(Node *) rel->joininfo,
|
Teach adjust_appendrel_attrs(_multilevel) to do multiple translations.
Currently, child relations are always base relations, so when we
translate parent relids to child relids, we only need to translate
a singler relid. However, the proposed partition-wise join feature
will create child joins, which will mean we need to translate a set
of parent relids to the corresponding child relids. This is
preliminary refactoring to make that possible.
Ashutosh Bapat. Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me. Some adjustments, mostly
cosmetic, by me.
Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-15 16:49:06 +02:00
|
|
|
1, &appinfo);
|
2019-01-21 09:12:40 +01:00
|
|
|
childrel->reltarget->exprs = (List *)
|
|
|
|
adjust_appendrel_attrs(root,
|
|
|
|
(Node *) rel->reltarget->exprs,
|
|
|
|
1, &appinfo);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We have to make child entries in the EquivalenceClass data
|
|
|
|
* structures as well. This is needed either if the parent
|
|
|
|
* participates in some eclass joins (because we will want to consider
|
|
|
|
* inner-indexscan joins on the individual children) or if the parent
|
|
|
|
* has useful pathkeys (because we should try to build MergeAppend
|
|
|
|
* paths that produce those sort orderings).
|
|
|
|
*/
|
|
|
|
if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
|
|
|
|
add_child_rel_equivalences(root, appinfo, rel, childrel);
|
|
|
|
childrel->has_eclass_joins = rel->has_eclass_joins;
|
2000-11-12 01:37:02 +01:00
|
|
|
|
Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the
child's tlist by applying adjust_appendrel_attrs() to the parent's. So in
the case where the parent's included a whole-row Var for the parent, the
child's contained a ConvertRowtypeExpr. To cope with that, that commit
added code to the planner, such as setrefs.c, but some code paths still
assumed that the tlist for a scan (or join) rel would only include Vars
and PlaceHolderVars, which was true before that commit, causing errors:
* When creating an explicit sort node for an input path for a mergejoin
path for a child join, prepare_sort_from_pathkeys() threw the 'could not
find pathkey item to sort' error.
* When deparsing a relation participating in a pushed down child join as a
subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw
the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan generated by
contrib/postgres_fdw for EvalPlanQual support for a pushed down child
join, fix_join_expr() threw the 'variable not found in subplan target
lists' error.
To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me. While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child not to violate the planner assumption, and tries to fix it up later,
But both approaches need more work, so refuse to generate partitionwise
join paths when whole-row Vars are involved, instead. We don't need to
handle ConvertRowtypeExprs in the child's tlists for now, so this commit
also removes the changes to the planner.
Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlist using that data, which also
required an extra step for adding PlaceHolderVars to that tlist, but it
would be more efficient to build it from the parent join's tlist through
the adjust_appendrel_attrs() transformation. So this commit builds that
list that way, and simplifies build_joinrel_tlist() and placeholder.c as
well as part of set_append_rel_size() to basically what they were before
partitionwise join went in.
Back-patch to PG11 where partitionwise join was introduced.
Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also
provided some of regression tests. Patch by me, reviewed by Robert Haas.
Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
2018-08-31 13:34:06 +02:00
|
|
|
/*
|
|
|
|
* Note: we could compute appropriate attr_needed data for the child's
|
|
|
|
* variables, by transforming the parent's attr_needed through the
|
|
|
|
* translated_vars mapping. However, currently there's no need
|
|
|
|
* because attr_needed is only examined for base relations not
|
|
|
|
* otherrels. So we just leave the child's attr_needed empty.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we consider partitionwise joins with the parent rel, do the same
|
|
|
|
* for partitioned child rels.
|
2019-01-21 09:12:40 +01:00
|
|
|
*
|
|
|
|
* Note: here we abuse the consider_partitionwise_join flag by setting
|
Avoid crash in partitionwise join planning under GEQO.
While trying to plan a partitionwise join, we may be faced with cases
where one or both input partitions for a particular segment of the join
have been pruned away. In HEAD and v11, this is problematic because
earlier processing didn't bother to make a pruned RelOptInfo fully
valid. With an upcoming patch to make partition pruning more efficient,
this'll be even more problematic because said RelOptInfo won't exist at
all.
The existing code attempts to deal with this by retroactively making the
RelOptInfo fully valid, but that causes crashes under GEQO because join
planning is done in a short-lived memory context. In v11 we could
probably have fixed this by switching to the planner's main context
while fixing up the RelOptInfo, but that idea doesn't scale well to the
upcoming patch. It would be better not to mess with the base-relation
data structures during join planning, anyway --- that's just a recipe
for order-of-operations bugs.
In many cases, though, we don't actually need the child RelOptInfo,
because if the input is certainly empty then the join segment's result
is certainly empty, so we can skip making a join plan altogether. (The
existing code ultimately arrives at the same conclusion, but only after
doing a lot more work.) This approach works except when the pruned-away
partition is on the nullable side of a LEFT, ANTI, or FULL join, and the
other side isn't pruned. But in those cases the existing code leaves a
lot to be desired anyway --- the correct output is just the result of
the unpruned side of the join, but we were emitting a useless outer join
against a dummy Result. Pending somebody writing code to handle that
more nicely, let's just abandon the partitionwise-join optimization in
such cases.
When the modified code skips making a join plan, it doesn't make a
join RelOptInfo either; this requires some upper-level code to
cope with nulls in part_rels[] arrays. We would have had to have
that anyway after the upcoming patch.
Back-patch to v11 since the crash is demonstrable there.
Discussion: https://postgr.es/m/8305.1553884377@sss.pgh.pa.us
2019-03-30 17:48:19 +01:00
|
|
|
* it for child rels that are not themselves partitioned. We do so to
|
|
|
|
* tell try_partitionwise_join() that the child rel is sufficiently
|
|
|
|
* valid to be used as a per-partition input, even if it later gets
|
|
|
|
* proven to be dummy. (It's not usable until we've set up the
|
|
|
|
* reltarget and EC entries, which we just did.)
|
Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the
child's tlist by applying adjust_appendrel_attrs() to the parent's. So in
the case where the parent's included a whole-row Var for the parent, the
child's contained a ConvertRowtypeExpr. To cope with that, that commit
added code to the planner, such as setrefs.c, but some code paths still
assumed that the tlist for a scan (or join) rel would only include Vars
and PlaceHolderVars, which was true before that commit, causing errors:
* When creating an explicit sort node for an input path for a mergejoin
path for a child join, prepare_sort_from_pathkeys() threw the 'could not
find pathkey item to sort' error.
* When deparsing a relation participating in a pushed down child join as a
subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw
the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan generated by
contrib/postgres_fdw for EvalPlanQual support for a pushed down child
join, fix_join_expr() threw the 'variable not found in subplan target
lists' error.
To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me. While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child not to violate the planner assumption, and tries to fix it up later,
But both approaches need more work, so refuse to generate partitionwise
join paths when whole-row Vars are involved, instead. We don't need to
handle ConvertRowtypeExprs in the child's tlists for now, so this commit
also removes the changes to the planner.
Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlist using that data, which also
required an extra step for adding PlaceHolderVars to that tlist, but it
would be more efficient to build it from the parent join's tlist through
the adjust_appendrel_attrs() transformation. So this commit builds that
list that way, and simplifies build_joinrel_tlist() and placeholder.c as
well as part of set_append_rel_size() to basically what they were before
partitionwise join went in.
Back-patch to PG11 where partitionwise join was introduced.
Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also
provided some of regression tests. Patch by me, reviewed by Robert Haas.
Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
2018-08-31 13:34:06 +02:00
|
|
|
*/
|
2019-01-21 09:12:40 +01:00
|
|
|
if (rel->consider_partitionwise_join)
|
Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the
child's tlist by applying adjust_appendrel_attrs() to the parent's. So in
the case where the parent's included a whole-row Var for the parent, the
child's contained a ConvertRowtypeExpr. To cope with that, that commit
added code to the planner, such as setrefs.c, but some code paths still
assumed that the tlist for a scan (or join) rel would only include Vars
and PlaceHolderVars, which was true before that commit, causing errors:
* When creating an explicit sort node for an input path for a mergejoin
path for a child join, prepare_sort_from_pathkeys() threw the 'could not
find pathkey item to sort' error.
* When deparsing a relation participating in a pushed down child join as a
subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw
the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan generated by
contrib/postgres_fdw for EvalPlanQual support for a pushed down child
join, fix_join_expr() threw the 'variable not found in subplan target
lists' error.
To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me. While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child not to violate the planner assumption, and tries to fix it up later,
But both approaches need more work, so refuse to generate partitionwise
join paths when whole-row Vars are involved, instead. We don't need to
handle ConvertRowtypeExprs in the child's tlists for now, so this commit
also removes the changes to the planner.
Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlist using that data, which also
required an extra step for adding PlaceHolderVars to that tlist, but it
would be more efficient to build it from the parent join's tlist through
the adjust_appendrel_attrs() transformation. So this commit builds that
list that way, and simplifies build_joinrel_tlist() and placeholder.c as
well as part of set_append_rel_size() to basically what they were before
partitionwise join went in.
Back-patch to PG11 where partitionwise join was introduced.
Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also
provided some of regression tests. Patch by me, reviewed by Robert Haas.
Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
2018-08-31 13:34:06 +02:00
|
|
|
childrel->consider_partitionwise_join = true;
|
|
|
|
|
2016-07-03 23:57:28 +02:00
|
|
|
/*
|
|
|
|
* If parallelism is allowable for this query in general, see whether
|
|
|
|
* it's allowable for this childrel in particular. But if we've
|
|
|
|
* already decided the appendrel is not parallel-safe as a whole,
|
|
|
|
* there's no point in considering parallelism for this child. For
|
|
|
|
* consistency, do this before calling set_rel_size() for the child.
|
|
|
|
*/
|
|
|
|
if (root->glob->parallelModeOK && rel->consider_parallel)
|
|
|
|
set_rel_consider_parallel(root, childrel, childRTE);
|
|
|
|
|
2000-11-12 01:37:02 +01:00
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* Compute the child's size.
|
2000-11-12 01:37:02 +01:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
set_rel_size(root, childrel, childRTindex, childRTE);
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2011-09-25 01:33:16 +02:00
|
|
|
/*
|
|
|
|
* It is possible that constraint exclusion detected a contradiction
|
2012-06-10 21:20:04 +02:00
|
|
|
* within a child subquery, even though we didn't prove one above. If
|
|
|
|
* so, we can skip this child.
|
2011-09-25 01:33:16 +02:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
if (IS_DUMMY_REL(childrel))
|
2011-09-25 01:33:16 +02:00
|
|
|
continue;
|
|
|
|
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
/* We have at least one live child. */
|
|
|
|
has_live_children = true;
|
|
|
|
|
2016-07-03 23:57:28 +02:00
|
|
|
/*
|
|
|
|
* If any live child is not parallel-safe, treat the whole appendrel
|
|
|
|
* as not parallel-safe. In future we might be able to generate plans
|
|
|
|
* in which some children are farmed out to workers while others are
|
|
|
|
* not; but we don't have that today, so it's a waste to consider
|
|
|
|
* partial paths anywhere in the appendrel unless it's all safe.
|
|
|
|
* (Child rels visited before this one will be unmarked in
|
|
|
|
* set_append_rel_pathlist().)
|
|
|
|
*/
|
|
|
|
if (!childrel->consider_parallel)
|
|
|
|
rel->consider_parallel = false;
|
|
|
|
|
2011-09-25 01:33:16 +02:00
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* Accumulate size information from each live child.
|
2003-06-30 01:05:05 +02:00
|
|
|
*/
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
Assert(childrel->rows > 0);
|
|
|
|
|
|
|
|
parent_rows += childrel->rows;
|
2016-03-14 21:59:59 +01:00
|
|
|
parent_size += childrel->reltarget->width * childrel->rows;
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Accumulate per-column estimates too. We need not do anything for
|
|
|
|
* PlaceHolderVars in the parent list. If child expression isn't a
|
|
|
|
* Var, or we didn't record a width estimate for it, we have to fall
|
|
|
|
* back on a datatype-based estimate.
|
|
|
|
*
|
Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-19 02:01:49 +01:00
|
|
|
* By construction, child's targetlist is 1-to-1 with parent's.
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
*/
|
2016-03-14 21:59:59 +01:00
|
|
|
forboth(parentvars, rel->reltarget->exprs,
|
|
|
|
childvars, childrel->reltarget->exprs)
|
2003-06-30 01:05:05 +02:00
|
|
|
{
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
Var *parentvar = (Var *) lfirst(parentvars);
|
|
|
|
Node *childvar = (Node *) lfirst(childvars);
|
2003-06-30 01:05:05 +02:00
|
|
|
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
if (IsA(parentvar, Var))
|
2004-06-05 03:55:05 +02:00
|
|
|
{
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
int pndx = parentvar->varattno - rel->min_attr;
|
|
|
|
int32 child_width = 0;
|
2011-08-23 23:11:41 +02:00
|
|
|
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
if (IsA(childvar, Var) &&
|
|
|
|
((Var *) childvar)->varno == childrel->relid)
|
2008-06-27 05:56:55 +02:00
|
|
|
{
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
int cndx = ((Var *) childvar)->varattno - childrel->min_attr;
|
|
|
|
|
|
|
|
child_width = childrel->attr_widths[cndx];
|
2008-06-27 05:56:55 +02:00
|
|
|
}
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
if (child_width <= 0)
|
|
|
|
child_width = get_typavgwidth(exprType(childvar),
|
|
|
|
exprTypmod(childvar));
|
|
|
|
Assert(child_width > 0);
|
|
|
|
parent_attrsizes[pndx] += child_width * childrel->rows;
|
2004-06-05 03:55:05 +02:00
|
|
|
}
|
2003-06-30 01:05:05 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
2000-11-12 01:37:02 +01:00
|
|
|
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
if (has_live_children)
|
2008-06-27 05:56:55 +02:00
|
|
|
{
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
/*
|
|
|
|
* Save the finished size estimates.
|
|
|
|
*/
|
2009-06-11 16:49:15 +02:00
|
|
|
int i;
|
2008-06-27 05:56:55 +02:00
|
|
|
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
Assert(parent_rows > 0);
|
|
|
|
rel->rows = parent_rows;
|
2016-03-14 21:59:59 +01:00
|
|
|
rel->reltarget->width = rint(parent_size / parent_rows);
|
2008-06-27 05:56:55 +02:00
|
|
|
for (i = 0; i < nattrs; i++)
|
|
|
|
rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set "raw tuples" count equal to "rows" for the appendrel; needed
|
|
|
|
* because some places assume rel->tuples is valid for any baserel.
|
|
|
|
*/
|
|
|
|
rel->tuples = parent_rows;
|
2018-11-07 18:12:56 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Note that we leave rel->pages as zero; this is important to avoid
|
|
|
|
* double-counting the appendrel tree in total_table_pages.
|
|
|
|
*/
|
2008-06-27 05:56:55 +02:00
|
|
|
}
|
|
|
|
else
|
Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 22:19:08 +02:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* All children were excluded by constraints, so mark the whole
|
|
|
|
* appendrel dummy. We must do this in this phase so that the rel's
|
|
|
|
* dummy-ness is visible when we generate paths for other rels.
|
|
|
|
*/
|
|
|
|
set_dummy_rel_pathlist(rel);
|
|
|
|
}
|
2008-06-27 05:56:55 +02:00
|
|
|
|
|
|
|
pfree(parent_attrsizes);
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* set_append_rel_pathlist
|
|
|
|
* Build access paths for an "append relation"
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
Index rti, RangeTblEntry *rte)
|
|
|
|
{
|
|
|
|
int parentRTindex = rti;
|
|
|
|
List *live_childrels = NIL;
|
|
|
|
ListCell *l;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Generate access paths for each member relation, and remember the
|
2017-03-14 23:20:17 +01:00
|
|
|
* non-dummy children.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
|
|
|
foreach(l, root->append_rel_list)
|
|
|
|
{
|
|
|
|
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
|
|
|
|
int childRTindex;
|
|
|
|
RangeTblEntry *childRTE;
|
|
|
|
RelOptInfo *childrel;
|
|
|
|
|
|
|
|
/* append_rel_list contains all append rels; ignore others */
|
|
|
|
if (appinfo->parent_relid != parentRTindex)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Re-locate the child RTE and RelOptInfo */
|
|
|
|
childRTindex = appinfo->child_relid;
|
|
|
|
childRTE = root->simple_rte_array[childRTindex];
|
|
|
|
childrel = root->simple_rel_array[childRTindex];
|
|
|
|
|
2016-07-03 23:57:28 +02:00
|
|
|
/*
|
|
|
|
* If set_append_rel_size() decided the parent appendrel was
|
|
|
|
* parallel-unsafe at some point after visiting this child rel, we
|
|
|
|
* need to propagate the unsafety marking down to the child, so that
|
|
|
|
* we don't generate useless partial paths for it.
|
|
|
|
*/
|
|
|
|
if (!rel->consider_parallel)
|
|
|
|
childrel->consider_parallel = false;
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* Compute the child's access paths.
|
|
|
|
*/
|
|
|
|
set_rel_pathlist(root, childrel, childRTindex, childRTE);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If child is dummy, ignore it.
|
|
|
|
*/
|
|
|
|
if (IS_DUMMY_REL(childrel))
|
|
|
|
continue;
|
|
|
|
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
/* Bubble up childrel's partitioned children. */
|
|
|
|
if (rel->part_scheme)
|
|
|
|
rel->partitioned_child_rels =
|
|
|
|
list_concat(rel->partitioned_child_rels,
|
Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input. Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".
To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input. (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)
In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it. We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one. I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 17:20:18 +02:00
|
|
|
childrel->partitioned_child_rels);
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
|
2012-08-12 00:42:20 +02:00
|
|
|
/*
|
|
|
|
* Child is live, so add it to the live_childrels list for use below.
|
|
|
|
*/
|
|
|
|
live_childrels = lappend(live_childrels, childrel);
|
2017-03-14 23:20:17 +01:00
|
|
|
}
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/* Add paths to the append relation. */
|
2017-03-14 23:20:17 +01:00
|
|
|
add_paths_to_append_rel(root, rel, live_childrels);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* add_paths_to_append_rel
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
* Generate paths for the given append relation given the set of non-dummy
|
2017-03-14 23:20:17 +01:00
|
|
|
* child rels.
|
|
|
|
*
|
|
|
|
* The function collects all parameterizations and orderings supported by the
|
|
|
|
* non-dummy children. For every such parameterization or ordering, it creates
|
|
|
|
* an append path collecting one path from each non-dummy child with given
|
|
|
|
* parameterization or ordering. Similarly it collects partial paths from
|
|
|
|
* non-dummy children to create partial append paths.
|
|
|
|
*/
|
Implement partition-wise grouping/aggregation.
If the partition keys of input relation are part of the GROUP BY
clause, all the rows belonging to a given group come from a single
partition. This allows aggregation/grouping over a partitioned
relation to be broken down * into aggregation/grouping on each
partition. This should be no worse, and often better, than the normal
approach.
If the GROUP BY clause does not contain all the partition keys, we can
still perform partial aggregation for each partition and then finalize
aggregation after appending the partial results. This is less certain
to be a win, but it's still useful.
Jeevan Chalke, Ashutosh Bapat, Robert Haas. The larger patch series
of which this patch is a part was also reviewed and tested by Antonin
Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin
Knizhnik, Pascal Legrand, and Rafia Sabih.
Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-03-22 17:49:48 +01:00
|
|
|
void
|
2017-03-14 23:20:17 +01:00
|
|
|
add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
List *live_childrels)
|
|
|
|
{
|
|
|
|
List *subpaths = NIL;
|
|
|
|
bool subpaths_valid = true;
|
|
|
|
List *partial_subpaths = NIL;
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
List *pa_partial_subpaths = NIL;
|
|
|
|
List *pa_nonpartial_subpaths = NIL;
|
2017-03-14 23:20:17 +01:00
|
|
|
bool partial_subpaths_valid = true;
|
2018-06-20 04:21:42 +02:00
|
|
|
bool pa_subpaths_valid;
|
2017-03-14 23:20:17 +01:00
|
|
|
List *all_child_pathkeys = NIL;
|
|
|
|
List *all_child_outers = NIL;
|
|
|
|
ListCell *l;
|
2017-03-21 14:48:04 +01:00
|
|
|
List *partitioned_rels = NIL;
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
double partial_rows = -1;
|
2017-03-21 14:48:04 +01:00
|
|
|
|
2018-06-20 04:21:42 +02:00
|
|
|
/* If appropriate, consider parallel append */
|
|
|
|
pa_subpaths_valid = enable_parallel_append && rel->consider_parallel;
|
|
|
|
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
/*
|
|
|
|
* AppendPath generated for partitioned tables must record the RT indexes
|
|
|
|
* of partitioned tables that are direct or indirect children of this
|
|
|
|
* Append rel.
|
|
|
|
*
|
|
|
|
* AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
|
|
|
|
* itself does not represent a partitioned relation, but the child sub-
|
|
|
|
* queries may contain references to partitioned relations. The loop
|
|
|
|
* below will look for such children and collect them in a list to be
|
|
|
|
* passed to the path creation function. (This assumes that we don't need
|
|
|
|
* to look through multiple levels of subquery RTEs; if we ever do, we
|
|
|
|
* could consider stuffing the list we generate here into sub-query RTE's
|
|
|
|
* RelOptInfo, just like we do for partitioned rels, which would be used
|
|
|
|
* when populating our parent rel with paths. For the present, that
|
|
|
|
* appears to be unnecessary.)
|
|
|
|
*/
|
|
|
|
if (rel->part_scheme != NULL)
|
2017-03-21 14:48:04 +01:00
|
|
|
{
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
if (IS_SIMPLE_REL(rel))
|
2018-08-02 01:42:46 +02:00
|
|
|
partitioned_rels = list_make1(rel->partitioned_child_rels);
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
else if (IS_JOIN_REL(rel))
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
{
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
int relid = -1;
|
2018-08-02 01:42:46 +02:00
|
|
|
List *partrels = NIL;
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For a partitioned joinrel, concatenate the component rels'
|
|
|
|
* partitioned_child_rels lists.
|
|
|
|
*/
|
|
|
|
while ((relid = bms_next_member(rel->relids, relid)) >= 0)
|
|
|
|
{
|
|
|
|
RelOptInfo *component;
|
|
|
|
|
|
|
|
Assert(relid >= 1 && relid < root->simple_rel_array_size);
|
|
|
|
component = root->simple_rel_array[relid];
|
|
|
|
Assert(component->part_scheme != NULL);
|
|
|
|
Assert(list_length(component->partitioned_child_rels) >= 1);
|
Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input. Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".
To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input. (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)
In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it. We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one. I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 17:20:18 +02:00
|
|
|
partrels = list_concat(partrels,
|
|
|
|
component->partitioned_child_rels);
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
}
|
2018-08-02 01:42:46 +02:00
|
|
|
|
|
|
|
partitioned_rels = list_make1(partrels);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
}
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 21:23:04 +02:00
|
|
|
|
|
|
|
Assert(list_length(partitioned_rels) >= 1);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
}
|
2017-03-14 23:20:17 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For every non-dummy child, remember the cheapest path. Also, identify
|
|
|
|
* all pathkeys (orderings) and parameterizations (required_outer sets)
|
|
|
|
* available for the non-dummy member relations.
|
|
|
|
*/
|
|
|
|
foreach(l, live_childrels)
|
|
|
|
{
|
|
|
|
RelOptInfo *childrel = lfirst(l);
|
|
|
|
ListCell *lcp;
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
Path *cheapest_partial_path = NULL;
|
2012-08-08 01:02:54 +02:00
|
|
|
|
2017-09-14 16:43:44 +02:00
|
|
|
/*
|
2018-08-02 01:42:46 +02:00
|
|
|
* For UNION ALLs with non-empty partitioned_child_rels, accumulate
|
|
|
|
* the Lists of child relations.
|
2017-09-14 16:43:44 +02:00
|
|
|
*/
|
2018-08-02 01:42:46 +02:00
|
|
|
if (rel->rtekind == RTE_SUBQUERY && childrel->partitioned_child_rels != NIL)
|
|
|
|
partitioned_rels = lappend(partitioned_rels,
|
|
|
|
childrel->partitioned_child_rels);
|
2017-09-14 16:43:44 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
2012-08-12 00:42:20 +02:00
|
|
|
* If child has an unparameterized cheapest-total path, add that to
|
|
|
|
* the unparameterized Append path we are constructing for the parent.
|
|
|
|
* If not, there's no workable unparameterized path.
|
Implement partition-wise grouping/aggregation.
If the partition keys of input relation are part of the GROUP BY
clause, all the rows belonging to a given group come from a single
partition. This allows aggregation/grouping over a partitioned
relation to be broken down * into aggregation/grouping on each
partition. This should be no worse, and often better, than the normal
approach.
If the GROUP BY clause does not contain all the partition keys, we can
still perform partial aggregation for each partition and then finalize
aggregation after appending the partial results. This is less certain
to be a win, but it's still useful.
Jeevan Chalke, Ashutosh Bapat, Robert Haas. The larger patch series
of which this patch is a part was also reviewed and tested by Antonin
Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin
Knizhnik, Pascal Legrand, and Rafia Sabih.
Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-03-22 17:49:48 +01:00
|
|
|
*
|
|
|
|
* With partitionwise aggregates, the child rel's pathlist may be
|
|
|
|
* empty, so don't assume that a path exists here.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
Implement partition-wise grouping/aggregation.
If the partition keys of input relation are part of the GROUP BY
clause, all the rows belonging to a given group come from a single
partition. This allows aggregation/grouping over a partitioned
relation to be broken down * into aggregation/grouping on each
partition. This should be no worse, and often better, than the normal
approach.
If the GROUP BY clause does not contain all the partition keys, we can
still perform partial aggregation for each partition and then finalize
aggregation after appending the partial results. This is less certain
to be a win, but it's still useful.
Jeevan Chalke, Ashutosh Bapat, Robert Haas. The larger patch series
of which this patch is a part was also reviewed and tested by Antonin
Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin
Knizhnik, Pascal Legrand, and Rafia Sabih.
Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-03-22 17:49:48 +01:00
|
|
|
if (childrel->pathlist != NIL &&
|
|
|
|
childrel->cheapest_total_path->param_info == NULL)
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
accumulate_append_subpath(childrel->cheapest_total_path,
|
|
|
|
&subpaths, NULL);
|
2012-08-12 00:42:20 +02:00
|
|
|
else
|
|
|
|
subpaths_valid = false;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/* Same idea, but for a partial plan. */
|
|
|
|
if (childrel->partial_pathlist != NIL)
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
{
|
|
|
|
cheapest_partial_path = linitial(childrel->partial_pathlist);
|
|
|
|
accumulate_append_subpath(cheapest_partial_path,
|
|
|
|
&partial_subpaths, NULL);
|
|
|
|
}
|
2016-01-20 20:29:22 +01:00
|
|
|
else
|
|
|
|
partial_subpaths_valid = false;
|
|
|
|
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
/*
|
|
|
|
* Same idea, but for a parallel append mixing partial and non-partial
|
|
|
|
* paths.
|
|
|
|
*/
|
|
|
|
if (pa_subpaths_valid)
|
|
|
|
{
|
|
|
|
Path *nppath = NULL;
|
|
|
|
|
|
|
|
nppath =
|
|
|
|
get_cheapest_parallel_safe_total_inner(childrel->pathlist);
|
|
|
|
|
|
|
|
if (cheapest_partial_path == NULL && nppath == NULL)
|
|
|
|
{
|
|
|
|
/* Neither a partial nor a parallel-safe path? Forget it. */
|
|
|
|
pa_subpaths_valid = false;
|
|
|
|
}
|
|
|
|
else if (nppath == NULL ||
|
|
|
|
(cheapest_partial_path != NULL &&
|
|
|
|
cheapest_partial_path->total_cost < nppath->total_cost))
|
|
|
|
{
|
|
|
|
/* Partial path is cheaper or the only option. */
|
|
|
|
Assert(cheapest_partial_path != NULL);
|
|
|
|
accumulate_append_subpath(cheapest_partial_path,
|
|
|
|
&pa_partial_subpaths,
|
|
|
|
&pa_nonpartial_subpaths);
|
|
|
|
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Either we've got only a non-partial path, or we think that
|
|
|
|
* a single backend can execute the best non-partial path
|
|
|
|
* faster than all the parallel backends working together can
|
|
|
|
* execute the best partial path.
|
|
|
|
*
|
|
|
|
* It might make sense to be more aggressive here. Even if
|
|
|
|
* the best non-partial path is more expensive than the best
|
|
|
|
* partial path, it could still be better to choose the
|
|
|
|
* non-partial path if there are several such paths that can
|
|
|
|
* be given to different workers. For now, we don't try to
|
|
|
|
* figure that out.
|
|
|
|
*/
|
|
|
|
accumulate_append_subpath(nppath,
|
|
|
|
&pa_nonpartial_subpaths,
|
|
|
|
NULL);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/*
|
|
|
|
* Collect lists of all the available path orderings and
|
2014-05-06 18:12:18 +02:00
|
|
|
* parameterizations for all the children. We use these as a
|
2012-01-28 01:26:38 +01:00
|
|
|
* heuristic to indicate which sort orderings and parameterizations we
|
|
|
|
* should build Append and MergeAppend paths for.
|
|
|
|
*/
|
|
|
|
foreach(lcp, childrel->pathlist)
|
|
|
|
{
|
|
|
|
Path *childpath = (Path *) lfirst(lcp);
|
|
|
|
List *childkeys = childpath->pathkeys;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Relids childouter = PATH_REQ_OUTER(childpath);
|
2012-01-28 01:26:38 +01:00
|
|
|
|
|
|
|
/* Unsorted paths don't contribute to pathkey list */
|
|
|
|
if (childkeys != NIL)
|
|
|
|
{
|
|
|
|
ListCell *lpk;
|
|
|
|
bool found = false;
|
|
|
|
|
|
|
|
/* Have we already seen this ordering? */
|
|
|
|
foreach(lpk, all_child_pathkeys)
|
|
|
|
{
|
|
|
|
List *existing_pathkeys = (List *) lfirst(lpk);
|
|
|
|
|
|
|
|
if (compare_pathkeys(existing_pathkeys,
|
|
|
|
childkeys) == PATHKEYS_EQUAL)
|
|
|
|
{
|
|
|
|
found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!found)
|
|
|
|
{
|
|
|
|
/* No, so add it to all_child_pathkeys */
|
|
|
|
all_child_pathkeys = lappend(all_child_pathkeys,
|
|
|
|
childkeys);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Unparameterized paths don't contribute to param-set list */
|
|
|
|
if (childouter)
|
|
|
|
{
|
|
|
|
ListCell *lco;
|
|
|
|
bool found = false;
|
|
|
|
|
|
|
|
/* Have we already seen this param set? */
|
|
|
|
foreach(lco, all_child_outers)
|
|
|
|
{
|
2012-06-10 21:20:04 +02:00
|
|
|
Relids existing_outers = (Relids) lfirst(lco);
|
2012-01-28 01:26:38 +01:00
|
|
|
|
|
|
|
if (bms_equal(existing_outers, childouter))
|
|
|
|
{
|
|
|
|
found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!found)
|
|
|
|
{
|
|
|
|
/* No, so add it to all_child_outers */
|
|
|
|
all_child_outers = lappend(all_child_outers,
|
|
|
|
childouter);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2007-01-28 19:50:40 +01:00
|
|
|
|
2000-11-12 01:37:02 +01:00
|
|
|
/*
|
2012-08-12 00:42:20 +02:00
|
|
|
* If we found unparameterized paths for all children, build an unordered,
|
|
|
|
* unparameterized Append path for the rel. (Note: this is correct even
|
|
|
|
* if we have zero or one live subpath due to constraint exclusion.)
|
2000-11-12 01:37:02 +01:00
|
|
|
*/
|
2012-08-12 00:42:20 +02:00
|
|
|
if (subpaths_valid)
|
Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query. This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.
This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:
1. Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.
2. Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.
Partition pruning is performed in two ways. When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor. This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.
For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait. Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output. In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)". Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times. This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.
This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable. This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.
Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-04-07 22:54:31 +02:00
|
|
|
add_path(rel, (Path *) create_append_path(root, rel, subpaths, NIL,
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
NIL, NULL, 0, false,
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
partitioned_rels, -1));
|
2016-01-20 20:29:22 +01:00
|
|
|
|
|
|
|
/*
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
* Consider an append of unordered, unparameterized partial paths. Make
|
|
|
|
* it parallel-aware if possible.
|
2016-01-20 20:29:22 +01:00
|
|
|
*/
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
if (partial_subpaths_valid && partial_subpaths != NIL)
|
2016-01-20 20:29:22 +01:00
|
|
|
{
|
|
|
|
AppendPath *appendpath;
|
|
|
|
ListCell *lc;
|
2016-06-09 15:08:27 +02:00
|
|
|
int parallel_workers = 0;
|
2016-01-20 20:29:22 +01:00
|
|
|
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
/* Find the highest number of workers requested for any subpath. */
|
2016-01-20 20:29:22 +01:00
|
|
|
foreach(lc, partial_subpaths)
|
|
|
|
{
|
|
|
|
Path *path = lfirst(lc);
|
|
|
|
|
2016-06-09 15:08:27 +02:00
|
|
|
parallel_workers = Max(parallel_workers, path->parallel_workers);
|
2016-01-20 20:29:22 +01:00
|
|
|
}
|
2016-06-09 15:08:27 +02:00
|
|
|
Assert(parallel_workers > 0);
|
2016-01-20 20:29:22 +01:00
|
|
|
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
/*
|
|
|
|
* If the use of parallel append is permitted, always request at least
|
2018-03-14 18:51:14 +01:00
|
|
|
* log2(# of children) workers. We assume it can be useful to have
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
* extra workers in this case because they will be spread out across
|
|
|
|
* the children. The precise formula is just a guess, but we don't
|
|
|
|
* want to end up with a radically different answer for a table with N
|
|
|
|
* partitions vs. an unpartitioned table with the same data, so the
|
|
|
|
* use of some kind of log-scaling here seems to make some sense.
|
|
|
|
*/
|
|
|
|
if (enable_parallel_append)
|
|
|
|
{
|
|
|
|
parallel_workers = Max(parallel_workers,
|
|
|
|
fls(list_length(live_childrels)));
|
|
|
|
parallel_workers = Min(parallel_workers,
|
|
|
|
max_parallel_workers_per_gather);
|
|
|
|
}
|
|
|
|
Assert(parallel_workers > 0);
|
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/* Generate a partial append path. */
|
Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query. This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.
This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:
1. Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.
2. Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.
Partition pruning is performed in two ways. When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor. This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.
For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait. Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output. In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)". Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times. This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.
This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable. This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.
Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-04-07 22:54:31 +02:00
|
|
|
appendpath = create_append_path(root, rel, NIL, partial_subpaths,
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
NIL, NULL, parallel_workers,
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
enable_parallel_append,
|
|
|
|
partitioned_rels, -1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure any subsequent partial paths use the same row count
|
|
|
|
* estimate.
|
|
|
|
*/
|
|
|
|
partial_rows = appendpath->path.rows;
|
|
|
|
|
|
|
|
/* Add the path. */
|
|
|
|
add_partial_path(rel, (Path *) appendpath);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Consider a parallel-aware append using a mix of partial and non-partial
|
|
|
|
* paths. (This only makes sense if there's at least one child which has
|
|
|
|
* a non-partial path that is substantially cheaper than any partial path;
|
|
|
|
* otherwise, we should use the append path added in the previous step.)
|
|
|
|
*/
|
|
|
|
if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
|
|
|
|
{
|
|
|
|
AppendPath *appendpath;
|
|
|
|
ListCell *lc;
|
|
|
|
int parallel_workers = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the highest number of workers requested for any partial
|
|
|
|
* subpath.
|
|
|
|
*/
|
|
|
|
foreach(lc, pa_partial_subpaths)
|
|
|
|
{
|
|
|
|
Path *path = lfirst(lc);
|
|
|
|
|
|
|
|
parallel_workers = Max(parallel_workers, path->parallel_workers);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Same formula here as above. It's even more important in this
|
|
|
|
* instance because the non-partial paths won't contribute anything to
|
|
|
|
* the planned number of parallel workers.
|
|
|
|
*/
|
|
|
|
parallel_workers = Max(parallel_workers,
|
|
|
|
fls(list_length(live_childrels)));
|
|
|
|
parallel_workers = Min(parallel_workers,
|
|
|
|
max_parallel_workers_per_gather);
|
|
|
|
Assert(parallel_workers > 0);
|
|
|
|
|
Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query. This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.
This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:
1. Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.
2. Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.
Partition pruning is performed in two ways. When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor. This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.
For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait. Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output. In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)". Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times. This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.
This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable. This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.
Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-04-07 22:54:31 +02:00
|
|
|
appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
pa_partial_subpaths,
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
NIL, NULL, parallel_workers, true,
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
partitioned_rels, partial_rows);
|
2016-01-20 20:29:22 +01:00
|
|
|
add_partial_path(rel, (Path *) appendpath);
|
|
|
|
}
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2010-10-14 22:56:39 +02:00
|
|
|
/*
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
* Also build unparameterized ordered append paths based on the collected
|
2012-08-12 00:42:20 +02:00
|
|
|
* list of child pathkeys.
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
2012-08-12 00:42:20 +02:00
|
|
|
if (subpaths_valid)
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
generate_orderedappend_paths(root, rel, live_childrels,
|
|
|
|
all_child_pathkeys,
|
|
|
|
partitioned_rels);
|
2012-01-28 01:26:38 +01:00
|
|
|
|
|
|
|
/*
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* Build Append paths for each parameterization seen among the child rels.
|
|
|
|
* (This may look pretty expensive, but in most cases of practical
|
|
|
|
* interest, the child rels will expose mostly the same parameterizations,
|
|
|
|
* so that not that many cases actually get considered here.)
|
|
|
|
*
|
|
|
|
* The Append node itself cannot enforce quals, so all qual checking must
|
2014-05-06 18:12:18 +02:00
|
|
|
* be done in the child paths. This means that to have a parameterized
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* Append path, we must have the exact same parameterization for each
|
|
|
|
* child path; otherwise some children might be failing to check the
|
|
|
|
* moved-down quals. To make them match up, we can try to increase the
|
|
|
|
* parameterization of lesser-parameterized paths.
|
2010-10-14 22:56:39 +02:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
foreach(l, all_child_outers)
|
|
|
|
{
|
2012-06-10 21:20:04 +02:00
|
|
|
Relids required_outer = (Relids) lfirst(l);
|
2012-01-28 01:26:38 +01:00
|
|
|
ListCell *lcr;
|
|
|
|
|
|
|
|
/* Select the child paths for an Append with this parameterization */
|
|
|
|
subpaths = NIL;
|
2012-08-12 00:42:20 +02:00
|
|
|
subpaths_valid = true;
|
2012-01-28 01:26:38 +01:00
|
|
|
foreach(lcr, live_childrels)
|
|
|
|
{
|
|
|
|
RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
|
2013-07-08 04:37:24 +02:00
|
|
|
Path *subpath;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Implement partition-wise grouping/aggregation.
If the partition keys of input relation are part of the GROUP BY
clause, all the rows belonging to a given group come from a single
partition. This allows aggregation/grouping over a partitioned
relation to be broken down * into aggregation/grouping on each
partition. This should be no worse, and often better, than the normal
approach.
If the GROUP BY clause does not contain all the partition keys, we can
still perform partial aggregation for each partition and then finalize
aggregation after appending the partial results. This is less certain
to be a win, but it's still useful.
Jeevan Chalke, Ashutosh Bapat, Robert Haas. The larger patch series
of which this patch is a part was also reviewed and tested by Antonin
Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin
Knizhnik, Pascal Legrand, and Rafia Sabih.
Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-03-22 17:49:48 +01:00
|
|
|
if (childrel->pathlist == NIL)
|
|
|
|
{
|
|
|
|
/* failed to make a suitable path for this child */
|
|
|
|
subpaths_valid = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2013-07-08 04:37:24 +02:00
|
|
|
subpath = get_cheapest_parameterized_child_path(root,
|
|
|
|
childrel,
|
|
|
|
required_outer);
|
|
|
|
if (subpath == NULL)
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
{
|
2013-07-08 04:37:24 +02:00
|
|
|
/* failed to make a suitable path for this child */
|
|
|
|
subpaths_valid = false;
|
|
|
|
break;
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
}
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
accumulate_append_subpath(subpath, &subpaths, NULL);
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
|
2012-08-12 00:42:20 +02:00
|
|
|
if (subpaths_valid)
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
add_path(rel, (Path *)
|
Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query. This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.
This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:
1. Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.
2. Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.
Partition pruning is performed in two ways. When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor. This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.
For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait. Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output. In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)". Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times. This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.
This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable. This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.
Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-04-07 22:54:31 +02:00
|
|
|
create_append_path(root, rel, subpaths, NIL,
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
NIL, required_outer, 0, false,
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
partitioned_rels, -1));
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
Suppress Append and MergeAppend plan nodes that have a single child.
If there's only one child relation, the Append or MergeAppend isn't
doing anything useful, and can be elided. It does have a purpose
during planning though, which is to serve as a buffer between parent
and child Var numbering. Therefore we keep it all the way through
to setrefs.c, and get rid of it only after fixing references in the
plan level(s) above it. This works largely the same as setrefs.c's
ancient hack to get rid of no-op SubqueryScan nodes, and can even
share some code with that.
Note the change to make setrefs.c use apply_tlist_labeling rather than
ad-hoc code. This has the effect of propagating the child's resjunk
and ressortgroupref labels, which formerly weren't propagated when
removing a SubqueryScan. Doing that is demonstrably necessary for
the [Merge]Append cases, and seems harmless for SubqueryScan, if only
because trivial_subqueryscan is afraid to collapse cases where the
resjunk marking differs. (I suspect that restriction could now be
removed, though it's unclear that it'd make any new matches possible,
since the outer query can't have references to a child resjunk column.)
David Rowley, reviewed by Alvaro Herrera and Tomas Vondra
Discussion: https://postgr.es/m/CAKJS1f_7u8ATyJ1JGTMHFoKDvZdeF-iEBhs+sM_SXowOr9cArg@mail.gmail.com
2019-03-25 20:42:35 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* When there is only a single child relation, the Append path can inherit
|
|
|
|
* any ordering available for the child rel's path, so that it's useful to
|
|
|
|
* consider ordered partial paths. Above we only considered the cheapest
|
|
|
|
* partial path for each child, but let's also make paths using any
|
|
|
|
* partial paths that have pathkeys.
|
|
|
|
*/
|
|
|
|
if (list_length(live_childrels) == 1)
|
|
|
|
{
|
|
|
|
RelOptInfo *childrel = (RelOptInfo *) linitial(live_childrels);
|
|
|
|
|
|
|
|
foreach(l, childrel->partial_pathlist)
|
|
|
|
{
|
|
|
|
Path *path = (Path *) lfirst(l);
|
|
|
|
AppendPath *appendpath;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Skip paths with no pathkeys. Also skip the cheapest partial
|
|
|
|
* path, since we already used that above.
|
|
|
|
*/
|
|
|
|
if (path->pathkeys == NIL ||
|
|
|
|
path == linitial(childrel->partial_pathlist))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
appendpath = create_append_path(root, rel, NIL, list_make1(path),
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
NIL, NULL,
|
|
|
|
path->parallel_workers, true,
|
Suppress Append and MergeAppend plan nodes that have a single child.
If there's only one child relation, the Append or MergeAppend isn't
doing anything useful, and can be elided. It does have a purpose
during planning though, which is to serve as a buffer between parent
and child Var numbering. Therefore we keep it all the way through
to setrefs.c, and get rid of it only after fixing references in the
plan level(s) above it. This works largely the same as setrefs.c's
ancient hack to get rid of no-op SubqueryScan nodes, and can even
share some code with that.
Note the change to make setrefs.c use apply_tlist_labeling rather than
ad-hoc code. This has the effect of propagating the child's resjunk
and ressortgroupref labels, which formerly weren't propagated when
removing a SubqueryScan. Doing that is demonstrably necessary for
the [Merge]Append cases, and seems harmless for SubqueryScan, if only
because trivial_subqueryscan is afraid to collapse cases where the
resjunk marking differs. (I suspect that restriction could now be
removed, though it's unclear that it'd make any new matches possible,
since the outer query can't have references to a child resjunk column.)
David Rowley, reviewed by Alvaro Herrera and Tomas Vondra
Discussion: https://postgr.es/m/CAKJS1f_7u8ATyJ1JGTMHFoKDvZdeF-iEBhs+sM_SXowOr9cArg@mail.gmail.com
2019-03-25 20:42:35 +01:00
|
|
|
partitioned_rels, partial_rows);
|
|
|
|
add_partial_path(rel, (Path *) appendpath);
|
|
|
|
}
|
|
|
|
}
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
* generate_orderedappend_paths
|
|
|
|
* Generate ordered append paths for an append relation
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
* Usually we generate MergeAppend paths here, but there are some special
|
|
|
|
* cases where we can generate simple Append paths, because the subpaths
|
|
|
|
* can provide tuples in the required order already.
|
|
|
|
*
|
|
|
|
* We generate a path for each ordering (pathkey list) appearing in
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* all_child_pathkeys.
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
|
|
|
* We consider both cheapest-startup and cheapest-total cases, ie, for each
|
|
|
|
* interesting ordering, collect all the cheapest startup subpaths and all the
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
* cheapest total paths, and build a suitable path for each case.
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
*
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
* We don't currently generate any parameterized ordered paths here. While
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* it would not take much more code here to do so, it's very unclear that it
|
|
|
|
* is worth the planning cycles to investigate such paths: there's little
|
|
|
|
* use for an ordered path on the inside of a nestloop. In fact, it's likely
|
|
|
|
* that the current coding of add_path would reject such paths out of hand,
|
|
|
|
* because add_path gives no credit for sort ordering of parameterized paths,
|
|
|
|
* and a parameterized MergeAppend is going to be more expensive than the
|
|
|
|
* corresponding parameterized Append path. If we ever try harder to support
|
|
|
|
* parameterized mergejoin plans, it might be worth adding support for
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
* parameterized paths here to feed such joins. (See notes in
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* optimizer/README for why that might not ever happen, though.)
|
2012-01-28 01:26:38 +01:00
|
|
|
*/
|
|
|
|
static void
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
List *live_childrels,
|
|
|
|
List *all_child_pathkeys,
|
|
|
|
List *partitioned_rels)
|
2012-01-28 01:26:38 +01:00
|
|
|
{
|
|
|
|
ListCell *lcp;
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
List *partition_pathkeys = NIL;
|
|
|
|
List *partition_pathkeys_desc = NIL;
|
|
|
|
bool partition_pathkeys_partial = true;
|
|
|
|
bool partition_pathkeys_desc_partial = true;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Some partitioned table setups may allow us to use an Append node
|
|
|
|
* instead of a MergeAppend. This is possible in cases such as RANGE
|
|
|
|
* partitioned tables where it's guaranteed that an earlier partition must
|
|
|
|
* contain rows which come earlier in the sort order. To detect whether
|
|
|
|
* this is relevant, build pathkey descriptions of the partition ordering,
|
|
|
|
* for both forward and reverse scans.
|
|
|
|
*/
|
|
|
|
if (rel->part_scheme != NULL && IS_SIMPLE_REL(rel) &&
|
|
|
|
partitions_are_ordered(rel->boundinfo, rel->nparts))
|
|
|
|
{
|
|
|
|
partition_pathkeys = build_partition_pathkeys(root, rel,
|
|
|
|
ForwardScanDirection,
|
|
|
|
&partition_pathkeys_partial);
|
|
|
|
|
|
|
|
partition_pathkeys_desc = build_partition_pathkeys(root, rel,
|
|
|
|
BackwardScanDirection,
|
|
|
|
&partition_pathkeys_desc_partial);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* You might think we should truncate_useless_pathkeys here, but
|
|
|
|
* allowing partition keys which are a subset of the query's pathkeys
|
|
|
|
* can often be useful. For example, consider a table partitioned by
|
|
|
|
* RANGE (a, b), and a query with ORDER BY a, b, c. If we have child
|
|
|
|
* paths that can produce the a, b, c ordering (perhaps via indexes on
|
|
|
|
* (a, b, c)) then it works to consider the appendrel output as
|
|
|
|
* ordered by a, b, c.
|
|
|
|
*/
|
|
|
|
}
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
/* Now consider each interesting sort ordering */
|
2012-01-28 01:26:38 +01:00
|
|
|
foreach(lcp, all_child_pathkeys)
|
2010-10-14 22:56:39 +02:00
|
|
|
{
|
2012-01-28 01:26:38 +01:00
|
|
|
List *pathkeys = (List *) lfirst(lcp);
|
2011-04-10 17:42:00 +02:00
|
|
|
List *startup_subpaths = NIL;
|
|
|
|
List *total_subpaths = NIL;
|
|
|
|
bool startup_neq_total = false;
|
|
|
|
ListCell *lcr;
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
bool match_partition_order;
|
|
|
|
bool match_partition_order_desc;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine if this sort ordering matches any partition pathkeys we
|
|
|
|
* have, for both ascending and descending partition order. If the
|
|
|
|
* partition pathkeys happen to be contained in pathkeys then it still
|
|
|
|
* works, as described above, providing that the partition pathkeys
|
|
|
|
* are complete and not just a prefix of the partition keys. (In such
|
|
|
|
* cases we'll be relying on the child paths to have sorted the
|
|
|
|
* lower-order columns of the required pathkeys.)
|
|
|
|
*/
|
|
|
|
match_partition_order =
|
|
|
|
pathkeys_contained_in(pathkeys, partition_pathkeys) ||
|
|
|
|
(!partition_pathkeys_partial &&
|
|
|
|
pathkeys_contained_in(partition_pathkeys, pathkeys));
|
|
|
|
|
|
|
|
match_partition_order_desc = !match_partition_order &&
|
|
|
|
(pathkeys_contained_in(pathkeys, partition_pathkeys_desc) ||
|
|
|
|
(!partition_pathkeys_desc_partial &&
|
|
|
|
pathkeys_contained_in(partition_pathkeys_desc, pathkeys)));
|
2010-10-14 22:56:39 +02:00
|
|
|
|
|
|
|
/* Select the child paths for this ordering... */
|
|
|
|
foreach(lcr, live_childrels)
|
|
|
|
{
|
|
|
|
RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
|
|
|
|
Path *cheapest_startup,
|
|
|
|
*cheapest_total;
|
|
|
|
|
|
|
|
/* Locate the right paths, if they are available. */
|
|
|
|
cheapest_startup =
|
|
|
|
get_cheapest_path_for_pathkeys(childrel->pathlist,
|
|
|
|
pathkeys,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
NULL,
|
2017-03-07 16:33:29 +01:00
|
|
|
STARTUP_COST,
|
|
|
|
false);
|
2010-10-14 22:56:39 +02:00
|
|
|
cheapest_total =
|
|
|
|
get_cheapest_path_for_pathkeys(childrel->pathlist,
|
|
|
|
pathkeys,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
NULL,
|
2017-03-07 16:33:29 +01:00
|
|
|
TOTAL_COST,
|
|
|
|
false);
|
2010-10-14 22:56:39 +02:00
|
|
|
|
|
|
|
/*
|
2012-01-28 01:26:38 +01:00
|
|
|
* If we can't find any paths with the right order just use the
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
* cheapest-total path; we'll have to sort it later.
|
2010-10-14 22:56:39 +02:00
|
|
|
*/
|
2012-01-28 01:26:38 +01:00
|
|
|
if (cheapest_startup == NULL || cheapest_total == NULL)
|
|
|
|
{
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
cheapest_startup = cheapest_total =
|
|
|
|
childrel->cheapest_total_path;
|
2012-08-12 00:42:20 +02:00
|
|
|
/* Assert we do have an unparameterized path for this child */
|
|
|
|
Assert(cheapest_total->param_info == NULL);
|
2012-01-28 01:26:38 +01:00
|
|
|
}
|
2010-10-14 22:56:39 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Notice whether we actually have different paths for the
|
2011-04-10 17:42:00 +02:00
|
|
|
* "cheapest" and "total" cases; frequently there will be no point
|
|
|
|
* in two create_merge_append_path() calls.
|
2010-10-14 22:56:39 +02:00
|
|
|
*/
|
|
|
|
if (cheapest_startup != cheapest_total)
|
|
|
|
startup_neq_total = true;
|
|
|
|
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
/*
|
|
|
|
* Collect the appropriate child paths. The required logic varies
|
|
|
|
* for the Append and MergeAppend cases.
|
|
|
|
*/
|
|
|
|
if (match_partition_order)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We're going to make a plain Append path. We don't need
|
|
|
|
* most of what accumulate_append_subpath would do, but we do
|
|
|
|
* want to cut out child Appends or MergeAppends if they have
|
|
|
|
* just a single subpath (and hence aren't doing anything
|
|
|
|
* useful).
|
|
|
|
*/
|
|
|
|
cheapest_startup = get_singleton_append_subpath(cheapest_startup);
|
|
|
|
cheapest_total = get_singleton_append_subpath(cheapest_total);
|
|
|
|
|
|
|
|
startup_subpaths = lappend(startup_subpaths, cheapest_startup);
|
|
|
|
total_subpaths = lappend(total_subpaths, cheapest_total);
|
|
|
|
}
|
|
|
|
else if (match_partition_order_desc)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* As above, but we need to reverse the order of the children,
|
|
|
|
* because nodeAppend.c doesn't know anything about reverse
|
|
|
|
* ordering and will scan the children in the order presented.
|
|
|
|
*/
|
|
|
|
cheapest_startup = get_singleton_append_subpath(cheapest_startup);
|
|
|
|
cheapest_total = get_singleton_append_subpath(cheapest_total);
|
|
|
|
|
|
|
|
startup_subpaths = lcons(cheapest_startup, startup_subpaths);
|
|
|
|
total_subpaths = lcons(cheapest_total, total_subpaths);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Otherwise, rely on accumulate_append_subpath to collect the
|
|
|
|
* child paths for the MergeAppend.
|
|
|
|
*/
|
|
|
|
accumulate_append_subpath(cheapest_startup,
|
|
|
|
&startup_subpaths, NULL);
|
|
|
|
accumulate_append_subpath(cheapest_total,
|
|
|
|
&total_subpaths, NULL);
|
|
|
|
}
|
2010-10-14 22:56:39 +02:00
|
|
|
}
|
|
|
|
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
/* ... and build the Append or MergeAppend paths */
|
|
|
|
if (match_partition_order || match_partition_order_desc)
|
|
|
|
{
|
|
|
|
/* We only need Append */
|
|
|
|
add_path(rel, (Path *) create_append_path(root,
|
|
|
|
rel,
|
|
|
|
startup_subpaths,
|
|
|
|
NIL,
|
|
|
|
pathkeys,
|
|
|
|
NULL,
|
|
|
|
0,
|
|
|
|
false,
|
|
|
|
partitioned_rels,
|
|
|
|
-1));
|
|
|
|
if (startup_neq_total)
|
|
|
|
add_path(rel, (Path *) create_append_path(root,
|
|
|
|
rel,
|
|
|
|
total_subpaths,
|
|
|
|
NIL,
|
|
|
|
pathkeys,
|
|
|
|
NULL,
|
|
|
|
0,
|
|
|
|
false,
|
|
|
|
partitioned_rels,
|
|
|
|
-1));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* We need MergeAppend */
|
2010-10-14 22:56:39 +02:00
|
|
|
add_path(rel, (Path *) create_merge_append_path(root,
|
|
|
|
rel,
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
startup_subpaths,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
pathkeys,
|
2017-03-21 14:48:04 +01:00
|
|
|
NULL,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
partitioned_rels));
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
if (startup_neq_total)
|
|
|
|
add_path(rel, (Path *) create_merge_append_path(root,
|
|
|
|
rel,
|
|
|
|
total_subpaths,
|
|
|
|
pathkeys,
|
|
|
|
NULL,
|
|
|
|
partitioned_rels));
|
|
|
|
}
|
2010-10-14 22:56:39 +02:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2013-07-08 04:37:24 +02:00
|
|
|
/*
|
|
|
|
* get_cheapest_parameterized_child_path
|
|
|
|
* Get cheapest path for this relation that has exactly the requested
|
|
|
|
* parameterization.
|
|
|
|
*
|
|
|
|
* Returns NULL if unable to create such a path.
|
|
|
|
*/
|
|
|
|
static Path *
|
|
|
|
get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
Relids required_outer)
|
|
|
|
{
|
|
|
|
Path *cheapest;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Look up the cheapest existing path with no more than the needed
|
|
|
|
* parameterization. If it has exactly the needed parameterization, we're
|
|
|
|
* done.
|
|
|
|
*/
|
|
|
|
cheapest = get_cheapest_path_for_pathkeys(rel->pathlist,
|
|
|
|
NIL,
|
|
|
|
required_outer,
|
2017-03-07 16:33:29 +01:00
|
|
|
TOTAL_COST,
|
|
|
|
false);
|
2013-07-08 04:37:24 +02:00
|
|
|
Assert(cheapest != NULL);
|
|
|
|
if (bms_equal(PATH_REQ_OUTER(cheapest), required_outer))
|
|
|
|
return cheapest;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Otherwise, we can "reparameterize" an existing path to match the given
|
|
|
|
* parameterization, which effectively means pushing down additional
|
|
|
|
* joinquals to be checked within the path's scan. However, some existing
|
|
|
|
* paths might check the available joinquals already while others don't;
|
|
|
|
* therefore, it's not clear which existing path will be cheapest after
|
2014-05-06 18:12:18 +02:00
|
|
|
* reparameterization. We have to go through them all and find out.
|
2013-07-08 04:37:24 +02:00
|
|
|
*/
|
|
|
|
cheapest = NULL;
|
|
|
|
foreach(lc, rel->pathlist)
|
|
|
|
{
|
|
|
|
Path *path = (Path *) lfirst(lc);
|
|
|
|
|
|
|
|
/* Can't use it if it needs more than requested parameterization */
|
|
|
|
if (!bms_is_subset(PATH_REQ_OUTER(path), required_outer))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Reparameterization can only increase the path's cost, so if it's
|
|
|
|
* already more expensive than the current cheapest, forget it.
|
|
|
|
*/
|
|
|
|
if (cheapest != NULL &&
|
|
|
|
compare_path_costs(cheapest, path, TOTAL_COST) <= 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Reparameterize if needed, then recheck cost */
|
|
|
|
if (!bms_equal(PATH_REQ_OUTER(path), required_outer))
|
|
|
|
{
|
|
|
|
path = reparameterize_path(root, path, required_outer, 1.0);
|
|
|
|
if (path == NULL)
|
|
|
|
continue; /* failed to reparameterize this one */
|
|
|
|
Assert(bms_equal(PATH_REQ_OUTER(path), required_outer));
|
|
|
|
|
|
|
|
if (cheapest != NULL &&
|
|
|
|
compare_path_costs(cheapest, path, TOTAL_COST) <= 0)
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* We have a new best path */
|
|
|
|
cheapest = path;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return the best path, or NULL if we found no suitable candidate */
|
|
|
|
return cheapest;
|
|
|
|
}
|
|
|
|
|
2010-10-14 22:56:39 +02:00
|
|
|
/*
|
|
|
|
* accumulate_append_subpath
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
* Add a subpath to the list being built for an Append or MergeAppend.
|
2010-10-14 22:56:39 +02:00
|
|
|
*
|
2014-03-28 16:50:01 +01:00
|
|
|
* It's possible that the child is itself an Append or MergeAppend path, in
|
|
|
|
* which case we can "cut out the middleman" and just add its child paths to
|
|
|
|
* our own list. (We don't try to do this earlier because we need to apply
|
|
|
|
* both levels of transformation to the quals.)
|
|
|
|
*
|
|
|
|
* Note that if we omit a child MergeAppend in this way, we are effectively
|
|
|
|
* omitting a sort step, which seems fine: if the parent is to be an Append,
|
|
|
|
* its result would be unsorted anyway, while if the parent is to be a
|
|
|
|
* MergeAppend, there's no point in a separate sort on a child.
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
*
|
|
|
|
* Normally, either path is a partial path and subpaths is a list of partial
|
|
|
|
* paths, or else path is a non-partial plan and subpaths is a list of those.
|
|
|
|
* However, if path is a parallel-aware Append, then we add its partial path
|
|
|
|
* children to subpaths and the rest to special_subpaths. If the latter is
|
|
|
|
* NULL, we don't flatten the path at all (unless it contains only partial
|
|
|
|
* paths).
|
2010-10-14 22:56:39 +02:00
|
|
|
*/
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
static void
|
|
|
|
accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
|
2010-10-14 22:56:39 +02:00
|
|
|
{
|
|
|
|
if (IsA(path, AppendPath))
|
|
|
|
{
|
2011-04-10 17:42:00 +02:00
|
|
|
AppendPath *apath = (AppendPath *) path;
|
2010-10-14 22:56:39 +02:00
|
|
|
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
if (!apath->path.parallel_aware || apath->first_partial_path == 0)
|
|
|
|
{
|
Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input. Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".
To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input. (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)
In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it. We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one. I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 17:20:18 +02:00
|
|
|
*subpaths = list_concat(*subpaths, apath->subpaths);
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
else if (special_subpaths != NULL)
|
|
|
|
{
|
|
|
|
List *new_special_subpaths;
|
|
|
|
|
|
|
|
/* Split Parallel Append into partial and non-partial subpaths */
|
|
|
|
*subpaths = list_concat(*subpaths,
|
|
|
|
list_copy_tail(apath->subpaths,
|
|
|
|
apath->first_partial_path));
|
|
|
|
new_special_subpaths =
|
|
|
|
list_truncate(list_copy(apath->subpaths),
|
|
|
|
apath->first_partial_path);
|
|
|
|
*special_subpaths = list_concat(*special_subpaths,
|
|
|
|
new_special_subpaths);
|
2018-01-10 17:18:40 +01:00
|
|
|
return;
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
}
|
2010-10-14 22:56:39 +02:00
|
|
|
}
|
2014-03-28 16:50:01 +01:00
|
|
|
else if (IsA(path, MergeAppendPath))
|
|
|
|
{
|
|
|
|
MergeAppendPath *mpath = (MergeAppendPath *) path;
|
|
|
|
|
Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input. Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".
To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input. (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)
In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it. We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one. I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 17:20:18 +02:00
|
|
|
*subpaths = list_concat(*subpaths, mpath->subpaths);
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
return;
|
2014-03-28 16:50:01 +01:00
|
|
|
}
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
|
|
|
|
*subpaths = lappend(*subpaths, path);
|
2010-10-14 22:56:39 +02:00
|
|
|
}
|
|
|
|
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
/*
|
|
|
|
* get_singleton_append_subpath
|
|
|
|
* Returns the single subpath of an Append/MergeAppend, or just
|
|
|
|
* return 'path' if it's not a single sub-path Append/MergeAppend.
|
|
|
|
*
|
|
|
|
* Note: 'path' must not be a parallel-aware path.
|
|
|
|
*/
|
|
|
|
static Path *
|
|
|
|
get_singleton_append_subpath(Path *path)
|
|
|
|
{
|
|
|
|
Assert(!path->parallel_aware);
|
|
|
|
|
|
|
|
if (IsA(path, AppendPath))
|
|
|
|
{
|
|
|
|
AppendPath *apath = (AppendPath *) path;
|
|
|
|
|
|
|
|
if (list_length(apath->subpaths) == 1)
|
|
|
|
return (Path *) linitial(apath->subpaths);
|
|
|
|
}
|
|
|
|
else if (IsA(path, MergeAppendPath))
|
|
|
|
{
|
|
|
|
MergeAppendPath *mpath = (MergeAppendPath *) path;
|
|
|
|
|
|
|
|
if (list_length(mpath->subpaths) == 1)
|
|
|
|
return (Path *) linitial(mpath->subpaths);
|
|
|
|
}
|
|
|
|
|
|
|
|
return path;
|
|
|
|
}
|
|
|
|
|
2007-05-26 20:23:02 +02:00
|
|
|
/*
|
|
|
|
* set_dummy_rel_pathlist
|
|
|
|
* Build a dummy path for a relation that's been excluded by constraints
|
|
|
|
*
|
|
|
|
* Rather than inventing a special "dummy" path type, we represent this as an
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
* AppendPath with no members (see also IS_DUMMY_APPEND/IS_DUMMY_REL macros).
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
*
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
* (See also mark_dummy_rel, which does basically the same thing, but is
|
|
|
|
* typically used to change a rel into dummy state after we already made
|
|
|
|
* paths for it.)
|
2007-05-26 20:23:02 +02:00
|
|
|
*/
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
static void
|
2007-05-26 20:23:02 +02:00
|
|
|
set_dummy_rel_pathlist(RelOptInfo *rel)
|
|
|
|
{
|
|
|
|
/* Set dummy size estimates --- we leave attr_widths[] as zeroes */
|
|
|
|
rel->rows = 0;
|
2016-03-14 21:59:59 +01:00
|
|
|
rel->reltarget->width = 0;
|
2007-05-26 20:23:02 +02:00
|
|
|
|
2012-01-28 01:26:38 +01:00
|
|
|
/* Discard any pre-existing paths; no further need for them */
|
|
|
|
rel->pathlist = NIL;
|
2016-01-20 20:29:22 +01:00
|
|
|
rel->partial_pathlist = NIL;
|
2012-01-28 01:26:38 +01:00
|
|
|
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
/* Set up the dummy path */
|
2019-03-14 17:16:09 +01:00
|
|
|
add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NIL,
|
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-06 01:20:30 +02:00
|
|
|
NIL, rel->lateral_relids,
|
Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention. We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.
Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 23:28:39 +01:00
|
|
|
0, false, NIL, -1));
|
2007-05-26 20:23:02 +02:00
|
|
|
|
2014-11-21 20:05:46 +01:00
|
|
|
/*
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
* We set the cheapest-path fields immediately, just in case they were
|
|
|
|
* pointing at some discarded path. This is redundant when we're called
|
|
|
|
* from set_rel_size(), but not when called from elsewhere, and doing it
|
|
|
|
* twice is harmless anyway.
|
2014-11-21 20:05:46 +01:00
|
|
|
*/
|
2007-05-26 20:23:02 +02:00
|
|
|
set_cheapest(rel);
|
|
|
|
}
|
|
|
|
|
2005-06-10 05:32:25 +02:00
|
|
|
/* quick-and-dirty test to see if any joining is needed */
|
|
|
|
static bool
|
|
|
|
has_multiple_baserels(PlannerInfo *root)
|
|
|
|
{
|
|
|
|
int num_base_rels = 0;
|
|
|
|
Index rti;
|
|
|
|
|
2006-01-31 22:39:25 +01:00
|
|
|
for (rti = 1; rti < root->simple_rel_array_size; rti++)
|
2005-06-10 05:32:25 +02:00
|
|
|
{
|
2006-01-31 22:39:25 +01:00
|
|
|
RelOptInfo *brel = root->simple_rel_array[rti];
|
2005-06-10 05:32:25 +02:00
|
|
|
|
|
|
|
if (brel == NULL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* ignore RTEs that are "other rels" */
|
|
|
|
if (brel->reloptkind == RELOPT_BASEREL)
|
|
|
|
if (++num_base_rels > 1)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2001-07-16 19:57:02 +02:00
|
|
|
/*
|
|
|
|
* set_subquery_pathlist
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* Generate SubqueryScan access paths for a subquery RTE
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
2012-08-08 01:02:54 +02:00
|
|
|
* We don't currently support generating parameterized paths for subqueries
|
|
|
|
* by pushing join clauses down into them; it seems too expensive to re-plan
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* the subquery multiple times to consider different alternatives.
|
|
|
|
* (XXX that could stand to be reconsidered, now that we use Paths.)
|
|
|
|
* So the paths made here will be parameterized if the subquery contains
|
|
|
|
* LATERAL references, otherwise not. As long as that's true, there's no need
|
|
|
|
* for a separate set_subquery_size phase: just make the paths right away.
|
2001-07-16 19:57:02 +02:00
|
|
|
*/
|
|
|
|
static void
|
2005-06-06 00:32:58 +02:00
|
|
|
set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
2001-07-16 19:57:02 +02:00
|
|
|
Index rti, RangeTblEntry *rte)
|
|
|
|
{
|
2005-06-10 05:32:25 +02:00
|
|
|
Query *parse = root->parse;
|
2001-07-16 19:57:02 +02:00
|
|
|
Query *subquery = rte->subquery;
|
2012-08-08 01:02:54 +02:00
|
|
|
Relids required_outer;
|
2014-06-27 20:08:48 +02:00
|
|
|
pushdown_safety_info safetyInfo;
|
2005-06-10 05:32:25 +02:00
|
|
|
double tuple_fraction;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
RelOptInfo *sub_final_rel;
|
|
|
|
ListCell *lc;
|
2001-07-16 19:57:02 +02:00
|
|
|
|
2009-03-10 21:58:26 +01:00
|
|
|
/*
|
|
|
|
* Must copy the Query so that planning doesn't mess up the RTE contents
|
|
|
|
* (really really need to fix the planner to not scribble on its input,
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
* someday ... but see remove_unused_subquery_outputs to start with).
|
2009-03-10 21:58:26 +01:00
|
|
|
*/
|
|
|
|
subquery = copyObject(subquery);
|
|
|
|
|
2012-08-08 01:02:54 +02:00
|
|
|
/*
|
|
|
|
* If it's a LATERAL subquery, it might contain some Vars of the current
|
2012-08-27 04:48:55 +02:00
|
|
|
* query level, requiring it to be treated as parameterized, even though
|
|
|
|
* we don't support pushing down join quals into subqueries.
|
2012-08-08 01:02:54 +02:00
|
|
|
*/
|
2012-08-27 04:48:55 +02:00
|
|
|
required_outer = rel->lateral_relids;
|
2012-08-08 01:02:54 +02:00
|
|
|
|
2013-06-06 05:44:02 +02:00
|
|
|
/*
|
2014-06-27 20:08:48 +02:00
|
|
|
* Zero out result area for subquery_is_pushdown_safe, so that it can set
|
|
|
|
* flags as needed while recursing. In particular, we need a workspace
|
|
|
|
* for keeping track of unsafe-to-reference columns. unsafeColumns[i]
|
2017-08-16 06:22:32 +02:00
|
|
|
* will be set true if we find that output column i of the subquery is
|
2014-06-27 20:08:48 +02:00
|
|
|
* unsafe to use in a pushed-down qual.
|
2013-06-06 05:44:02 +02:00
|
|
|
*/
|
2014-06-27 20:08:48 +02:00
|
|
|
memset(&safetyInfo, 0, sizeof(safetyInfo));
|
|
|
|
safetyInfo.unsafeColumns = (bool *)
|
2004-05-31 01:40:41 +02:00
|
|
|
palloc0((list_length(subquery->targetList) + 1) * sizeof(bool));
|
2003-04-25 01:43:09 +02:00
|
|
|
|
2014-06-27 20:08:48 +02:00
|
|
|
/*
|
|
|
|
* If the subquery has the "security_barrier" flag, it means the subquery
|
Rename pg_rowsecurity -> pg_policy and other fixes
As pointed out by Robert, we should really have named pg_rowsecurity
pg_policy, as the objects stored in that catalog are policies. This
patch fixes that and updates the column names to start with 'pol' to
match the new catalog name.
The security consideration for COPY with row level security, also
pointed out by Robert, has also been addressed by remembering and
re-checking the OID of the relation initially referenced during COPY
processing, to make sure it hasn't changed under us by the time we
finish planning out the query which has been built.
Robert and Alvaro also commented on missing OCLASS and OBJECT entries
for POLICY (formerly ROWSECURITY or POLICY, depending) in various
places. This patch fixes that too, which also happens to add the
ability to COMMENT on policies.
In passing, attempt to improve the consistency of messages, comments,
and documentation as well. This removes various incarnations of
'row-security', 'row-level security', 'Row-security', etc, in favor
of 'policy', 'row level security' or 'row_security' as appropriate.
Happy Thanksgiving!
2014-11-27 07:06:36 +01:00
|
|
|
* originated from a view that must enforce row level security. Then we
|
2014-06-27 20:08:48 +02:00
|
|
|
* must not push down quals that contain leaky functions. (Ideally this
|
|
|
|
* would be checked inside subquery_is_pushdown_safe, but since we don't
|
|
|
|
* currently pass the RTE to that function, we must do it here.)
|
|
|
|
*/
|
|
|
|
safetyInfo.unsafeLeaky = rte->security_barrier;
|
|
|
|
|
2001-07-16 19:57:02 +02:00
|
|
|
/*
|
|
|
|
* If there are any restriction clauses that have been attached to the
|
2005-10-15 04:49:52 +02:00
|
|
|
* subquery relation, consider pushing them down to become WHERE or HAVING
|
|
|
|
* quals of the subquery itself. This transformation is useful because it
|
|
|
|
* may allow us to generate a better plan for the subquery than evaluating
|
|
|
|
* all the subquery output rows and then filtering them.
|
2001-07-16 19:57:02 +02:00
|
|
|
*
|
2005-10-15 04:49:52 +02:00
|
|
|
* There are several cases where we cannot push down clauses. Restrictions
|
|
|
|
* involving the subquery are checked by subquery_is_pushdown_safe().
|
|
|
|
* Restrictions on individual clauses are checked by
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
* qual_is_pushdown_safe(). Also, we don't want to push down
|
|
|
|
* pseudoconstant clauses; better to have the gating node above the
|
|
|
|
* subquery.
|
2001-07-16 19:57:02 +02:00
|
|
|
*
|
2005-11-22 19:17:34 +01:00
|
|
|
* Non-pushed-down clauses will get evaluated as qpquals of the
|
|
|
|
* SubqueryScan node.
|
2001-07-16 19:57:02 +02:00
|
|
|
*
|
|
|
|
* XXX Are there any cases where we want to make a policy decision not to
|
2003-03-22 02:49:38 +01:00
|
|
|
* push down a pushable qual, because it'd result in a worse plan?
|
2001-07-16 19:57:02 +02:00
|
|
|
*/
|
2002-08-29 18:03:49 +02:00
|
|
|
if (rel->baserestrictinfo != NIL &&
|
2014-06-27 20:08:48 +02:00
|
|
|
subquery_is_pushdown_safe(subquery, subquery, &safetyInfo))
|
2001-07-16 19:57:02 +02:00
|
|
|
{
|
|
|
|
/* OK to consider pushing down individual quals */
|
|
|
|
List *upperrestrictlist = NIL;
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *l;
|
2001-07-16 19:57:02 +02:00
|
|
|
|
2004-05-26 06:41:50 +02:00
|
|
|
foreach(l, rel->baserestrictinfo)
|
2001-07-16 19:57:02 +02:00
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
|
2001-07-16 19:57:02 +02:00
|
|
|
Node *clause = (Node *) rinfo->clause;
|
|
|
|
|
Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
2006-07-01 20:38:33 +02:00
|
|
|
if (!rinfo->pseudoconstant &&
|
2014-06-27 20:08:48 +02:00
|
|
|
qual_is_pushdown_safe(subquery, rti, clause, &safetyInfo))
|
2001-07-16 19:57:02 +02:00
|
|
|
{
|
2003-03-22 02:49:38 +01:00
|
|
|
/* Push it down */
|
2005-06-04 21:19:42 +02:00
|
|
|
subquery_push_qual(subquery, rte, rti, clause);
|
2001-07-16 19:57:02 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2003-03-22 02:49:38 +01:00
|
|
|
/* Keep it in the upper query */
|
|
|
|
upperrestrictlist = lappend(upperrestrictlist, rinfo);
|
2001-07-16 19:57:02 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
rel->baserestrictinfo = upperrestrictlist;
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
/* We don't bother recomputing baserestrict_min_security */
|
2001-07-16 19:57:02 +02:00
|
|
|
}
|
|
|
|
|
2014-06-27 20:08:48 +02:00
|
|
|
pfree(safetyInfo.unsafeColumns);
|
2003-04-25 01:43:09 +02:00
|
|
|
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
/*
|
|
|
|
* The upper query might not use all the subquery's output columns; if
|
|
|
|
* not, we can simplify.
|
|
|
|
*/
|
|
|
|
remove_unused_subquery_outputs(subquery, rel);
|
|
|
|
|
2005-06-10 05:32:25 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* We can safely pass the outer tuple_fraction down to the subquery if the
|
|
|
|
* outer level has no joining, aggregation, or sorting to do. Otherwise
|
|
|
|
* we'd better tell the subquery to plan for full retrieval. (XXX This
|
|
|
|
* could probably be made more intelligent ...)
|
2005-06-10 05:32:25 +02:00
|
|
|
*/
|
|
|
|
if (parse->hasAggs ||
|
|
|
|
parse->groupClause ||
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
parse->groupingSets ||
|
2005-06-10 05:32:25 +02:00
|
|
|
parse->havingQual ||
|
|
|
|
parse->distinctClause ||
|
|
|
|
parse->sortClause ||
|
|
|
|
has_multiple_baserels(root))
|
|
|
|
tuple_fraction = 0.0; /* default case */
|
|
|
|
else
|
|
|
|
tuple_fraction = root->tuple_fraction;
|
|
|
|
|
Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
2012-09-05 18:54:03 +02:00
|
|
|
/* plan_params should not be in use in current query level */
|
|
|
|
Assert(root->plan_params == NIL);
|
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/* Generate a subroot and Paths for the subquery */
|
|
|
|
rel->subroot = subquery_planner(root->glob, subquery,
|
2008-10-04 23:56:55 +02:00
|
|
|
root,
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
false, tuple_fraction);
|
2001-07-16 19:57:02 +02:00
|
|
|
|
Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
2012-09-05 18:54:03 +02:00
|
|
|
/* Isolate the params needed by this specific subplan */
|
|
|
|
rel->subplan_params = root->plan_params;
|
|
|
|
root->plan_params = NIL;
|
|
|
|
|
2011-09-25 01:33:16 +02:00
|
|
|
/*
|
2012-06-10 21:20:04 +02:00
|
|
|
* It's possible that constraint exclusion proved the subquery empty. If
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* so, it's desirable to produce an unadorned dummy path so that we will
|
2013-05-08 22:59:09 +02:00
|
|
|
* recognize appropriate optimizations at this query level.
|
2011-09-25 01:33:16 +02:00
|
|
|
*/
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
sub_final_rel = fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL);
|
|
|
|
|
|
|
|
if (IS_DUMMY_REL(sub_final_rel))
|
2011-09-25 01:33:16 +02:00
|
|
|
{
|
|
|
|
set_dummy_rel_pathlist(rel);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/*
|
|
|
|
* Mark rel with estimated output rows, width, etc. Note that we have to
|
|
|
|
* do this before generating outer-query paths, else cost_subqueryscan is
|
|
|
|
* not happy.
|
|
|
|
*/
|
2011-09-03 21:35:12 +02:00
|
|
|
set_subquery_size_estimates(root, rel);
|
2001-07-16 19:57:02 +02:00
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/*
|
|
|
|
* For each Path that subquery_planner produced, make a SubqueryScanPath
|
|
|
|
* in the outer query.
|
|
|
|
*/
|
|
|
|
foreach(lc, sub_final_rel->pathlist)
|
|
|
|
{
|
|
|
|
Path *subpath = (Path *) lfirst(lc);
|
|
|
|
List *pathkeys;
|
|
|
|
|
|
|
|
/* Convert subpath's pathkeys to outer representation */
|
|
|
|
pathkeys = convert_subquery_pathkeys(root,
|
|
|
|
rel,
|
|
|
|
subpath->pathkeys,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
make_tlist_from_pathtarget(subpath->pathtarget));
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
|
|
|
|
/* Generate outer path using this subpath */
|
|
|
|
add_path(rel, (Path *)
|
|
|
|
create_subqueryscan_path(root, rel, subpath,
|
|
|
|
pathkeys, required_outer));
|
|
|
|
}
|
Let Parallel Append over simple UNION ALL have partial subpaths.
A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.
There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe). Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths. Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.
Patch by me, reviewed and tested by Amit Khandekar, Rajkumar
Raghuwanshi, and Ashutosh Bapat. Test cases based on examples by
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com
2018-03-13 21:34:08 +01:00
|
|
|
|
2018-04-25 21:14:14 +02:00
|
|
|
/* If outer rel allows parallelism, do same for partial paths. */
|
|
|
|
if (rel->consider_parallel && bms_is_empty(required_outer))
|
Let Parallel Append over simple UNION ALL have partial subpaths.
A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.
There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe). Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths. Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.
Patch by me, reviewed and tested by Amit Khandekar, Rajkumar
Raghuwanshi, and Ashutosh Bapat. Test cases based on examples by
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com
2018-03-13 21:34:08 +01:00
|
|
|
{
|
2018-04-25 21:14:14 +02:00
|
|
|
/* If consider_parallel is false, there should be no partial paths. */
|
|
|
|
Assert(sub_final_rel->consider_parallel ||
|
|
|
|
sub_final_rel->partial_pathlist == NIL);
|
Let Parallel Append over simple UNION ALL have partial subpaths.
A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.
There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe). Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths. Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.
Patch by me, reviewed and tested by Amit Khandekar, Rajkumar
Raghuwanshi, and Ashutosh Bapat. Test cases based on examples by
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com
2018-03-13 21:34:08 +01:00
|
|
|
|
2018-04-25 21:14:14 +02:00
|
|
|
/* Same for partial paths. */
|
|
|
|
foreach(lc, sub_final_rel->partial_pathlist)
|
|
|
|
{
|
|
|
|
Path *subpath = (Path *) lfirst(lc);
|
|
|
|
List *pathkeys;
|
|
|
|
|
|
|
|
/* Convert subpath's pathkeys to outer representation */
|
|
|
|
pathkeys = convert_subquery_pathkeys(root,
|
|
|
|
rel,
|
|
|
|
subpath->pathkeys,
|
|
|
|
make_tlist_from_pathtarget(subpath->pathtarget));
|
|
|
|
|
|
|
|
/* Generate outer path using this subpath */
|
|
|
|
add_partial_path(rel, (Path *)
|
|
|
|
create_subqueryscan_path(root, rel, subpath,
|
|
|
|
pathkeys,
|
|
|
|
required_outer));
|
|
|
|
}
|
Let Parallel Append over simple UNION ALL have partial subpaths.
A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.
There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe). Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths. Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.
Patch by me, reviewed and tested by Amit Khandekar, Rajkumar
Raghuwanshi, and Ashutosh Bapat. Test cases based on examples by
Rajkumar Raghuwanshi.
Discussion: http://postgr.es/m/CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com
2018-03-13 21:34:08 +01:00
|
|
|
}
|
2001-07-16 19:57:02 +02:00
|
|
|
}
|
2000-11-12 01:37:02 +01:00
|
|
|
|
2002-05-12 22:10:05 +02:00
|
|
|
/*
|
|
|
|
* set_function_pathlist
|
|
|
|
* Build the (single) access path for a function RTE
|
|
|
|
*/
|
|
|
|
static void
|
2005-06-06 00:32:58 +02:00
|
|
|
set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
2002-05-12 22:10:05 +02:00
|
|
|
{
|
2012-08-08 01:02:54 +02:00
|
|
|
Relids required_outer;
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
List *pathkeys = NIL;
|
2012-08-08 01:02:54 +02:00
|
|
|
|
|
|
|
/*
|
2012-08-27 04:48:55 +02:00
|
|
|
* We don't support pushing join clauses into the quals of a function
|
|
|
|
* scan, but it could still have required parameterization due to LATERAL
|
|
|
|
* refs in the function expression.
|
2012-08-08 01:02:54 +02:00
|
|
|
*/
|
2012-08-27 04:48:55 +02:00
|
|
|
required_outer = rel->lateral_relids;
|
2012-08-08 01:02:54 +02:00
|
|
|
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
/*
|
|
|
|
* The result is considered unordered unless ORDINALITY was used, in which
|
|
|
|
* case it is ordered by the ordinal column (the last one). See if we
|
|
|
|
* care, by checking for uses of that Var in equivalence classes.
|
|
|
|
*/
|
|
|
|
if (rte->funcordinality)
|
|
|
|
{
|
|
|
|
AttrNumber ordattno = rel->max_attr;
|
|
|
|
Var *var = NULL;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/*
|
Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-19 02:01:49 +01:00
|
|
|
* Is there a Var for it in rel's targetlist? If not, the query did
|
|
|
|
* not reference the ordinality column, or at least not in any way
|
|
|
|
* that would be interesting for sorting.
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
*/
|
2016-03-14 21:59:59 +01:00
|
|
|
foreach(lc, rel->reltarget->exprs)
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
{
|
|
|
|
Var *node = (Var *) lfirst(lc);
|
|
|
|
|
|
|
|
/* checking varno/varlevelsup is just paranoia */
|
|
|
|
if (IsA(node, Var) &&
|
|
|
|
node->varattno == ordattno &&
|
|
|
|
node->varno == rel->relid &&
|
|
|
|
node->varlevelsup == 0)
|
|
|
|
{
|
|
|
|
var = node;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to build pathkeys for this Var with int8 sorting. We tell
|
|
|
|
* build_expression_pathkey not to build any new equivalence class; if
|
|
|
|
* the Var isn't already mentioned in some EC, it means that nothing
|
|
|
|
* cares about the ordering.
|
|
|
|
*/
|
|
|
|
if (var)
|
|
|
|
pathkeys = build_expression_pathkey(root,
|
|
|
|
(Expr *) var,
|
|
|
|
NULL, /* below outer joins */
|
|
|
|
Int8LessOperator,
|
|
|
|
rel->relids,
|
|
|
|
false);
|
|
|
|
}
|
|
|
|
|
2002-05-12 22:10:05 +02:00
|
|
|
/* Generate appropriate path */
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
add_path(rel, create_functionscan_path(root, rel,
|
|
|
|
pathkeys, required_outer));
|
2002-05-12 22:10:05 +02:00
|
|
|
}
|
|
|
|
|
2006-08-02 03:59:48 +02:00
|
|
|
/*
|
|
|
|
* set_values_pathlist
|
|
|
|
* Build the (single) access path for a VALUES RTE
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
2012-08-12 22:01:26 +02:00
|
|
|
Relids required_outer;
|
|
|
|
|
|
|
|
/*
|
2012-08-27 04:48:55 +02:00
|
|
|
* We don't support pushing join clauses into the quals of a values scan,
|
|
|
|
* but it could still have required parameterization due to LATERAL refs
|
|
|
|
* in the values expressions.
|
2012-08-12 22:01:26 +02:00
|
|
|
*/
|
2012-08-27 04:48:55 +02:00
|
|
|
required_outer = rel->lateral_relids;
|
2012-08-12 22:01:26 +02:00
|
|
|
|
2006-08-02 03:59:48 +02:00
|
|
|
/* Generate appropriate path */
|
2012-08-12 22:01:26 +02:00
|
|
|
add_path(rel, create_valuesscan_path(root, rel, required_outer));
|
2006-08-02 03:59:48 +02:00
|
|
|
}
|
|
|
|
|
2017-03-08 16:39:37 +01:00
|
|
|
/*
|
|
|
|
* set_tablefunc_pathlist
|
|
|
|
* Build the (single) access path for a table func RTE
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_tablefunc_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
|
|
|
Relids required_outer;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't support pushing join clauses into the quals of a tablefunc
|
|
|
|
* scan, but it could still have required parameterization due to LATERAL
|
|
|
|
* refs in the function expression.
|
|
|
|
*/
|
|
|
|
required_outer = rel->lateral_relids;
|
|
|
|
|
|
|
|
/* Generate appropriate path */
|
|
|
|
add_path(rel, create_tablefuncscan_path(root, rel,
|
|
|
|
required_outer));
|
|
|
|
}
|
|
|
|
|
2008-10-04 23:56:55 +02:00
|
|
|
/*
|
|
|
|
* set_cte_pathlist
|
|
|
|
* Build the (single) access path for a non-self-reference CTE RTE
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
|
|
|
* There's no need for a separate set_cte_size phase, since we don't
|
2012-08-27 04:48:55 +02:00
|
|
|
* support join-qual-parameterized paths for CTEs.
|
2008-10-04 23:56:55 +02:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
|
|
|
Plan *cteplan;
|
|
|
|
PlannerInfo *cteroot;
|
|
|
|
Index levelsup;
|
|
|
|
int ndx;
|
|
|
|
ListCell *lc;
|
|
|
|
int plan_id;
|
2012-08-27 04:48:55 +02:00
|
|
|
Relids required_outer;
|
2008-10-04 23:56:55 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the referenced CTE, and locate the plan previously made for it.
|
|
|
|
*/
|
|
|
|
levelsup = rte->ctelevelsup;
|
|
|
|
cteroot = root;
|
|
|
|
while (levelsup-- > 0)
|
|
|
|
{
|
|
|
|
cteroot = cteroot->parent_root;
|
|
|
|
if (!cteroot) /* shouldn't happen */
|
|
|
|
elog(ERROR, "bad levelsup for CTE \"%s\"", rte->ctename);
|
|
|
|
}
|
2009-06-11 16:49:15 +02:00
|
|
|
|
2008-10-04 23:56:55 +02:00
|
|
|
/*
|
|
|
|
* Note: cte_plan_ids can be shorter than cteList, if we are still working
|
|
|
|
* on planning the CTEs (ie, this is a side-reference from another CTE).
|
|
|
|
* So we mustn't use forboth here.
|
|
|
|
*/
|
|
|
|
ndx = 0;
|
|
|
|
foreach(lc, cteroot->parse->cteList)
|
|
|
|
{
|
|
|
|
CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc);
|
|
|
|
|
|
|
|
if (strcmp(cte->ctename, rte->ctename) == 0)
|
|
|
|
break;
|
|
|
|
ndx++;
|
|
|
|
}
|
|
|
|
if (lc == NULL) /* shouldn't happen */
|
|
|
|
elog(ERROR, "could not find CTE \"%s\"", rte->ctename);
|
|
|
|
if (ndx >= list_length(cteroot->cte_plan_ids))
|
|
|
|
elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename);
|
|
|
|
plan_id = list_nth_int(cteroot->cte_plan_ids, ndx);
|
|
|
|
Assert(plan_id > 0);
|
|
|
|
cteplan = (Plan *) list_nth(root->glob->subplans, plan_id - 1);
|
|
|
|
|
|
|
|
/* Mark rel with estimated output rows, width, etc */
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
set_cte_size_estimates(root, rel, cteplan->plan_rows);
|
2008-10-04 23:56:55 +02:00
|
|
|
|
2012-08-27 04:48:55 +02:00
|
|
|
/*
|
|
|
|
* We don't support pushing join clauses into the quals of a CTE scan, but
|
|
|
|
* it could still have required parameterization due to LATERAL refs in
|
2013-08-18 02:22:37 +02:00
|
|
|
* its tlist.
|
2012-08-27 04:48:55 +02:00
|
|
|
*/
|
|
|
|
required_outer = rel->lateral_relids;
|
|
|
|
|
2008-10-04 23:56:55 +02:00
|
|
|
/* Generate appropriate path */
|
2012-08-27 04:48:55 +02:00
|
|
|
add_path(rel, create_ctescan_path(root, rel, required_outer));
|
2008-10-04 23:56:55 +02:00
|
|
|
}
|
|
|
|
|
2017-04-01 06:17:18 +02:00
|
|
|
/*
|
|
|
|
* set_namedtuplestore_pathlist
|
|
|
|
* Build the (single) access path for a named tuplestore RTE
|
|
|
|
*
|
|
|
|
* There's no need for a separate set_namedtuplestore_size phase, since we
|
|
|
|
* don't support join-qual-parameterized paths for tuplestores.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_namedtuplestore_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
RangeTblEntry *rte)
|
|
|
|
{
|
|
|
|
Relids required_outer;
|
|
|
|
|
|
|
|
/* Mark rel with estimated output rows, width, etc */
|
|
|
|
set_namedtuplestore_size_estimates(root, rel);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't support pushing join clauses into the quals of a tuplestore
|
|
|
|
* scan, but it could still have required parameterization due to LATERAL
|
|
|
|
* refs in its tlist.
|
|
|
|
*/
|
|
|
|
required_outer = rel->lateral_relids;
|
|
|
|
|
|
|
|
/* Generate appropriate path */
|
|
|
|
add_path(rel, create_namedtuplestorescan_path(root, rel, required_outer));
|
|
|
|
|
|
|
|
/* Select cheapest path (pretty easy in this case...) */
|
|
|
|
set_cheapest(rel);
|
|
|
|
}
|
|
|
|
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
/*
|
|
|
|
* set_result_pathlist
|
|
|
|
* Build the (single) access path for an RTE_RESULT RTE
|
|
|
|
*
|
|
|
|
* There's no need for a separate set_result_size phase, since we
|
|
|
|
* don't support join-qual-parameterized paths for these RTEs.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_result_pathlist(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
RangeTblEntry *rte)
|
|
|
|
{
|
|
|
|
Relids required_outer;
|
|
|
|
|
|
|
|
/* Mark rel with estimated output rows, width, etc */
|
|
|
|
set_result_size_estimates(root, rel);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't support pushing join clauses into the quals of a Result scan,
|
|
|
|
* but it could still have required parameterization due to LATERAL refs
|
|
|
|
* in its tlist.
|
|
|
|
*/
|
|
|
|
required_outer = rel->lateral_relids;
|
|
|
|
|
|
|
|
/* Generate appropriate path */
|
|
|
|
add_path(rel, create_resultscan_path(root, rel, required_outer));
|
|
|
|
|
|
|
|
/* Select cheapest path (pretty easy in this case...) */
|
|
|
|
set_cheapest(rel);
|
|
|
|
}
|
|
|
|
|
2008-10-04 23:56:55 +02:00
|
|
|
/*
|
|
|
|
* set_worktable_pathlist
|
|
|
|
* Build the (single) access path for a self-reference CTE RTE
|
2012-01-28 01:26:38 +01:00
|
|
|
*
|
|
|
|
* There's no need for a separate set_worktable_size phase, since we don't
|
2012-08-27 04:48:55 +02:00
|
|
|
* support join-qual-parameterized paths for CTEs.
|
2008-10-04 23:56:55 +02:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
|
|
|
|
{
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
Path *ctepath;
|
2008-10-04 23:56:55 +02:00
|
|
|
PlannerInfo *cteroot;
|
|
|
|
Index levelsup;
|
2012-08-27 04:48:55 +02:00
|
|
|
Relids required_outer;
|
2008-10-04 23:56:55 +02:00
|
|
|
|
|
|
|
/*
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* We need to find the non-recursive term's path, which is in the plan
|
2009-06-11 16:49:15 +02:00
|
|
|
* level that's processing the recursive UNION, which is one level *below*
|
|
|
|
* where the CTE comes from.
|
2008-10-04 23:56:55 +02:00
|
|
|
*/
|
|
|
|
levelsup = rte->ctelevelsup;
|
|
|
|
if (levelsup == 0) /* shouldn't happen */
|
|
|
|
elog(ERROR, "bad levelsup for CTE \"%s\"", rte->ctename);
|
|
|
|
levelsup--;
|
|
|
|
cteroot = root;
|
|
|
|
while (levelsup-- > 0)
|
|
|
|
{
|
|
|
|
cteroot = cteroot->parent_root;
|
|
|
|
if (!cteroot) /* shouldn't happen */
|
|
|
|
elog(ERROR, "bad levelsup for CTE \"%s\"", rte->ctename);
|
|
|
|
}
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
ctepath = cteroot->non_recursive_path;
|
|
|
|
if (!ctepath) /* shouldn't happen */
|
|
|
|
elog(ERROR, "could not find path for CTE \"%s\"", rte->ctename);
|
2008-10-04 23:56:55 +02:00
|
|
|
|
|
|
|
/* Mark rel with estimated output rows, width, etc */
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
set_cte_size_estimates(root, rel, ctepath->rows);
|
2008-10-04 23:56:55 +02:00
|
|
|
|
2012-08-27 04:48:55 +02:00
|
|
|
/*
|
|
|
|
* We don't support pushing join clauses into the quals of a worktable
|
|
|
|
* scan, but it could still have required parameterization due to LATERAL
|
2013-08-18 02:22:37 +02:00
|
|
|
* refs in its tlist. (I'm not sure this is actually possible given the
|
|
|
|
* restrictions on recursive references, but it's easy enough to support.)
|
2012-08-27 04:48:55 +02:00
|
|
|
*/
|
|
|
|
required_outer = rel->lateral_relids;
|
|
|
|
|
2008-10-04 23:56:55 +02:00
|
|
|
/* Generate appropriate path */
|
2012-08-27 04:48:55 +02:00
|
|
|
add_path(rel, create_worktablescan_path(root, rel, required_outer));
|
2008-10-04 23:56:55 +02:00
|
|
|
}
|
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/*
|
|
|
|
* generate_gather_paths
|
2017-03-09 13:40:36 +01:00
|
|
|
* Generate parallel access paths for a relation by pushing a Gather or
|
|
|
|
* Gather Merge on top of a partial path.
|
2016-04-30 18:29:21 +02:00
|
|
|
*
|
|
|
|
* This must not be called until after we're done creating all partial paths
|
|
|
|
* for the specified relation. (Otherwise, add_partial_path might delete a
|
2017-03-09 13:40:36 +01:00
|
|
|
* path that some GatherPath or GatherMergePath has a reference to.)
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
*
|
|
|
|
* If we're generating paths for a scan or join relation, override_rows will
|
|
|
|
* be false, and we'll just use the relation's size estimate. When we're
|
|
|
|
* being called for a partially-grouped path, though, we need to override
|
|
|
|
* the rowcount estimate. (It's not clear that the particular value we're
|
|
|
|
* using here is actually best, but the underlying rel has no estimate so
|
|
|
|
* we must do something.)
|
2016-01-20 20:29:22 +01:00
|
|
|
*/
|
|
|
|
void
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
|
2016-01-20 20:29:22 +01:00
|
|
|
{
|
|
|
|
Path *cheapest_partial_path;
|
|
|
|
Path *simple_gather_path;
|
2017-03-09 13:40:36 +01:00
|
|
|
ListCell *lc;
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
double rows;
|
|
|
|
double *rowsp = NULL;
|
2016-01-20 20:29:22 +01:00
|
|
|
|
|
|
|
/* If there are no partial paths, there's nothing to do here. */
|
|
|
|
if (rel->partial_pathlist == NIL)
|
|
|
|
return;
|
|
|
|
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
/* Should we override the rel's rowcount estimate? */
|
|
|
|
if (override_rows)
|
|
|
|
rowsp = &rows;
|
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
/*
|
2017-03-09 13:40:36 +01:00
|
|
|
* The output of Gather is always unsorted, so there's only one partial
|
|
|
|
* path of interest: the cheapest one. That will be the one at the front
|
|
|
|
* of partial_pathlist because of the way add_partial_path works.
|
2016-01-20 20:29:22 +01:00
|
|
|
*/
|
|
|
|
cheapest_partial_path = linitial(rel->partial_pathlist);
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
rows =
|
|
|
|
cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
|
2016-01-20 20:29:22 +01:00
|
|
|
simple_gather_path = (Path *)
|
2016-03-21 14:20:53 +01:00
|
|
|
create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
NULL, rowsp);
|
2016-01-20 20:29:22 +01:00
|
|
|
add_path(rel, simple_gather_path);
|
2017-03-09 13:40:36 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For each useful ordering, we can consider an order-preserving Gather
|
|
|
|
* Merge.
|
|
|
|
*/
|
2017-05-17 22:31:56 +02:00
|
|
|
foreach(lc, rel->partial_pathlist)
|
2017-03-09 13:40:36 +01:00
|
|
|
{
|
2017-05-17 22:31:56 +02:00
|
|
|
Path *subpath = (Path *) lfirst(lc);
|
|
|
|
GatherMergePath *path;
|
2017-03-09 13:40:36 +01:00
|
|
|
|
|
|
|
if (subpath->pathkeys == NIL)
|
|
|
|
continue;
|
|
|
|
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
rows = subpath->rows * subpath->parallel_workers;
|
2017-03-09 13:40:36 +01:00
|
|
|
path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
|
Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step. Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized). This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.
Robert Haas and Jeevan Chalke. The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.
Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 15:30:12 +01:00
|
|
|
subpath->pathkeys, NULL, rowsp);
|
2017-03-09 13:40:36 +01:00
|
|
|
add_path(rel, &path->path);
|
|
|
|
}
|
2016-01-20 20:29:22 +01:00
|
|
|
}
|
|
|
|
|
2020-04-07 16:43:18 +02:00
|
|
|
/*
|
|
|
|
* get_useful_pathkeys_for_relation
|
|
|
|
* Determine which orderings of a relation might be useful.
|
|
|
|
*
|
|
|
|
* Getting data in sorted order can be useful either because the requested
|
|
|
|
* order matches the final output ordering for the overall query we're
|
|
|
|
* planning, or because it enables an efficient merge join. Here, we try
|
|
|
|
* to figure out which pathkeys to consider.
|
|
|
|
*
|
|
|
|
* This allows us to do incremental sort on top of an index scan under a gather
|
|
|
|
* merge node, i.e. parallelized.
|
|
|
|
*
|
|
|
|
* XXX At the moment this can only ever return a list with a single element,
|
|
|
|
* because it looks at query_pathkeys only. So we might return the pathkeys
|
|
|
|
* directly, but it seems plausible we'll want to consider other orderings
|
|
|
|
* in the future. For example, we might want to consider pathkeys useful for
|
|
|
|
* merge joins.
|
|
|
|
*/
|
|
|
|
static List *
|
|
|
|
get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
|
|
|
|
{
|
|
|
|
List *useful_pathkeys_list = NIL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Considering query_pathkeys is always worth it, because it might allow us
|
|
|
|
* to avoid a total sort when we have a partially presorted path available.
|
|
|
|
*/
|
|
|
|
if (root->query_pathkeys)
|
|
|
|
{
|
|
|
|
ListCell *lc;
|
|
|
|
int npathkeys = 0; /* useful pathkeys */
|
|
|
|
|
|
|
|
foreach(lc, root->query_pathkeys)
|
|
|
|
{
|
|
|
|
PathKey *pathkey = (PathKey *) lfirst(lc);
|
|
|
|
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We can only build an Incremental Sort for pathkeys which contain
|
|
|
|
* an EC member in the current relation, so ignore any suffix of the
|
|
|
|
* list as soon as we find a pathkey without an EC member the
|
|
|
|
* relation.
|
|
|
|
*
|
|
|
|
* By still returning the prefix of the pathkeys list that does meet
|
|
|
|
* criteria of EC membership in the current relation, we enable not
|
|
|
|
* just an incremental sort on the entirety of query_pathkeys but
|
|
|
|
* also incremental sort below a JOIN.
|
|
|
|
*/
|
|
|
|
if (!find_em_expr_for_rel(pathkey_ec, rel))
|
|
|
|
break;
|
|
|
|
|
|
|
|
npathkeys++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The whole query_pathkeys list matches, so append it directly, to allow
|
|
|
|
* comparing pathkeys easily by comparing list pointer. If we have to truncate
|
|
|
|
* the pathkeys, we gotta do a copy though.
|
|
|
|
*/
|
|
|
|
if (npathkeys == list_length(root->query_pathkeys))
|
|
|
|
useful_pathkeys_list = lappend(useful_pathkeys_list,
|
|
|
|
root->query_pathkeys);
|
|
|
|
else if (npathkeys > 0)
|
|
|
|
useful_pathkeys_list = lappend(useful_pathkeys_list,
|
|
|
|
list_truncate(list_copy(root->query_pathkeys),
|
|
|
|
npathkeys));
|
|
|
|
}
|
|
|
|
|
|
|
|
return useful_pathkeys_list;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* generate_useful_gather_paths
|
|
|
|
* Generate parallel access paths for a relation by pushing a Gather or
|
|
|
|
* Gather Merge on top of a partial path.
|
|
|
|
*
|
|
|
|
* Unlike plain generate_gather_paths, this looks both at pathkeys of input
|
|
|
|
* paths (aiming to preserve the ordering), but also considers ordering that
|
|
|
|
* might be useful for nodes above the gather merge node, and tries to add
|
|
|
|
* a sort (regular or incremental) to provide that.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
|
|
|
|
{
|
|
|
|
ListCell *lc;
|
|
|
|
double rows;
|
|
|
|
double *rowsp = NULL;
|
|
|
|
List *useful_pathkeys_list = NIL;
|
|
|
|
Path *cheapest_partial_path = NULL;
|
|
|
|
|
|
|
|
/* If there are no partial paths, there's nothing to do here. */
|
|
|
|
if (rel->partial_pathlist == NIL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* Should we override the rel's rowcount estimate? */
|
|
|
|
if (override_rows)
|
|
|
|
rowsp = &rows;
|
|
|
|
|
|
|
|
/* generate the regular gather (merge) paths */
|
|
|
|
generate_gather_paths(root, rel, override_rows);
|
|
|
|
|
|
|
|
/* consider incremental sort for interesting orderings */
|
|
|
|
useful_pathkeys_list = get_useful_pathkeys_for_relation(root, rel);
|
|
|
|
|
|
|
|
/* used for explicit (full) sort paths */
|
|
|
|
cheapest_partial_path = linitial(rel->partial_pathlist);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Consider incremental sort paths for each interesting ordering.
|
|
|
|
*/
|
|
|
|
foreach(lc, useful_pathkeys_list)
|
|
|
|
{
|
|
|
|
List *useful_pathkeys = lfirst(lc);
|
|
|
|
ListCell *lc2;
|
|
|
|
bool is_sorted;
|
|
|
|
int presorted_keys;
|
|
|
|
|
|
|
|
foreach(lc2, rel->partial_pathlist)
|
|
|
|
{
|
|
|
|
Path *subpath = (Path *) lfirst(lc2);
|
|
|
|
GatherMergePath *path;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the path has no ordering at all, then we can't use either
|
|
|
|
* incremental sort or rely on implict sorting with a gather merge.
|
|
|
|
*/
|
|
|
|
if (subpath->pathkeys == NIL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
is_sorted = pathkeys_count_contained_in(useful_pathkeys,
|
|
|
|
subpath->pathkeys,
|
|
|
|
&presorted_keys);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't need to consider the case where a subpath is already
|
|
|
|
* fully sorted because generate_gather_paths already creates a
|
|
|
|
* gather merge path for every subpath that has pathkeys present.
|
|
|
|
*
|
|
|
|
* But since the subpath is already sorted, we know we don't need
|
|
|
|
* to consider adding a sort (other either kind) on top of it, so
|
|
|
|
* we can continue here.
|
|
|
|
*/
|
|
|
|
if (is_sorted)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Consider regular sort for the cheapest partial path (for each
|
|
|
|
* useful pathkeys). We know the path is not sorted, because we'd
|
|
|
|
* not get here otherwise.
|
|
|
|
*
|
|
|
|
* This is not redundant with the gather paths created in
|
|
|
|
* generate_gather_paths, because that doesn't generate ordered
|
|
|
|
* output. Here we add an explicit sort to match the useful
|
|
|
|
* ordering.
|
|
|
|
*/
|
|
|
|
if (cheapest_partial_path == subpath)
|
|
|
|
{
|
|
|
|
Path *tmp;
|
|
|
|
|
|
|
|
tmp = (Path *) create_sort_path(root,
|
|
|
|
rel,
|
|
|
|
subpath,
|
|
|
|
useful_pathkeys,
|
|
|
|
-1.0);
|
|
|
|
|
|
|
|
rows = tmp->rows * tmp->parallel_workers;
|
|
|
|
|
|
|
|
path = create_gather_merge_path(root, rel,
|
|
|
|
tmp,
|
|
|
|
rel->reltarget,
|
|
|
|
tmp->pathkeys,
|
|
|
|
NULL,
|
|
|
|
rowsp);
|
|
|
|
|
|
|
|
add_path(rel, &path->path);
|
|
|
|
|
|
|
|
/* Fall through */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Consider incremental sort, but only when the subpath is already
|
|
|
|
* partially sorted on a pathkey prefix.
|
|
|
|
*/
|
|
|
|
if (enable_incrementalsort && presorted_keys > 0)
|
|
|
|
{
|
|
|
|
Path *tmp;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We should have already excluded pathkeys of length 1 because
|
|
|
|
* then presorted_keys > 0 would imply is_sorted was true.
|
|
|
|
*/
|
|
|
|
Assert(list_length(useful_pathkeys) != 1);
|
|
|
|
|
|
|
|
tmp = (Path *) create_incremental_sort_path(root,
|
|
|
|
rel,
|
|
|
|
subpath,
|
|
|
|
useful_pathkeys,
|
|
|
|
presorted_keys,
|
|
|
|
-1);
|
|
|
|
|
|
|
|
path = create_gather_merge_path(root, rel,
|
|
|
|
tmp,
|
|
|
|
rel->reltarget,
|
|
|
|
tmp->pathkeys,
|
|
|
|
NULL,
|
|
|
|
rowsp);
|
|
|
|
|
|
|
|
add_path(rel, &path->path);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2000-09-12 23:07:18 +02:00
|
|
|
/*
|
2005-12-20 03:30:36 +01:00
|
|
|
* make_rel_from_joinlist
|
|
|
|
* Build access paths using a "joinlist" to guide the join path search.
|
|
|
|
*
|
|
|
|
* See comments for deconstruct_jointree() for definition of the joinlist
|
|
|
|
* data structure.
|
2000-09-12 23:07:18 +02:00
|
|
|
*/
|
2005-12-20 03:30:36 +01:00
|
|
|
static RelOptInfo *
|
|
|
|
make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
|
2000-09-12 23:07:18 +02:00
|
|
|
{
|
2000-09-29 20:21:41 +02:00
|
|
|
int levels_needed;
|
2005-12-20 03:30:36 +01:00
|
|
|
List *initial_rels;
|
|
|
|
ListCell *jl;
|
2000-09-12 23:07:18 +02:00
|
|
|
|
2000-09-29 20:21:41 +02:00
|
|
|
/*
|
2005-12-20 03:30:36 +01:00
|
|
|
* Count the number of child joinlist nodes. This is the depth of the
|
2005-10-15 04:49:52 +02:00
|
|
|
* dynamic-programming algorithm we must employ to consider all ways of
|
|
|
|
* joining the child nodes.
|
2000-09-29 20:21:41 +02:00
|
|
|
*/
|
2005-12-20 03:30:36 +01:00
|
|
|
levels_needed = list_length(joinlist);
|
2000-09-29 20:21:41 +02:00
|
|
|
|
|
|
|
if (levels_needed <= 0)
|
|
|
|
return NULL; /* nothing to do? */
|
|
|
|
|
|
|
|
/*
|
2005-12-20 03:30:36 +01:00
|
|
|
* Construct a list of rels corresponding to the child joinlist nodes.
|
2000-09-29 20:21:41 +02:00
|
|
|
* This may contain both base rels and rels constructed according to
|
2005-12-20 03:30:36 +01:00
|
|
|
* sub-joinlists.
|
2000-09-29 20:21:41 +02:00
|
|
|
*/
|
2005-12-20 03:30:36 +01:00
|
|
|
initial_rels = NIL;
|
|
|
|
foreach(jl, joinlist)
|
2000-09-12 23:07:18 +02:00
|
|
|
{
|
2005-12-20 03:30:36 +01:00
|
|
|
Node *jlnode = (Node *) lfirst(jl);
|
|
|
|
RelOptInfo *thisrel;
|
|
|
|
|
|
|
|
if (IsA(jlnode, RangeTblRef))
|
|
|
|
{
|
|
|
|
int varno = ((RangeTblRef *) jlnode)->rtindex;
|
|
|
|
|
|
|
|
thisrel = find_base_rel(root, varno);
|
|
|
|
}
|
|
|
|
else if (IsA(jlnode, List))
|
|
|
|
{
|
|
|
|
/* Recurse to handle subproblem */
|
|
|
|
thisrel = make_rel_from_joinlist(root, (List *) jlnode);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
elog(ERROR, "unrecognized joinlist node type: %d",
|
|
|
|
(int) nodeTag(jlnode));
|
|
|
|
thisrel = NULL; /* keep compiler quiet */
|
|
|
|
}
|
2000-09-12 23:07:18 +02:00
|
|
|
|
2005-12-20 03:30:36 +01:00
|
|
|
initial_rels = lappend(initial_rels, thisrel);
|
2000-09-29 20:21:41 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (levels_needed == 1)
|
|
|
|
{
|
|
|
|
/*
|
2005-12-20 03:30:36 +01:00
|
|
|
* Single joinlist node, so we're done.
|
2000-09-29 20:21:41 +02:00
|
|
|
*/
|
2004-05-26 06:41:50 +02:00
|
|
|
return (RelOptInfo *) linitial(initial_rels);
|
2000-09-29 20:21:41 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Consider the different orders in which we could join the rels,
|
2007-09-26 20:51:51 +02:00
|
|
|
* using a plugin, GEQO, or the regular join search code.
|
2008-01-11 05:02:18 +01:00
|
|
|
*
|
|
|
|
* We put the initial_rels list into a PlannerInfo field because
|
|
|
|
* has_legal_joinclause() needs to look at it (ugly :-().
|
2000-09-29 20:21:41 +02:00
|
|
|
*/
|
2008-01-11 05:02:18 +01:00
|
|
|
root->initial_rels = initial_rels;
|
|
|
|
|
2007-09-26 20:51:51 +02:00
|
|
|
if (join_search_hook)
|
|
|
|
return (*join_search_hook) (root, levels_needed, initial_rels);
|
|
|
|
else if (enable_geqo && levels_needed >= geqo_threshold)
|
2000-09-29 20:21:41 +02:00
|
|
|
return geqo(root, levels_needed, initial_rels);
|
|
|
|
else
|
2007-09-26 20:51:51 +02:00
|
|
|
return standard_join_search(root, levels_needed, initial_rels);
|
2000-09-12 23:07:18 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2007-09-26 20:51:51 +02:00
|
|
|
* standard_join_search
|
|
|
|
* Find possible joinpaths for a query by successively finding ways
|
2000-02-07 05:41:04 +01:00
|
|
|
* to join component relations into join relations.
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
2000-02-07 05:41:04 +01:00
|
|
|
* 'levels_needed' is the number of iterations needed, ie, the number of
|
2000-09-12 23:07:18 +02:00
|
|
|
* independent jointree items in the query. This is > 1.
|
|
|
|
*
|
|
|
|
* 'initial_rels' is a list of RelOptInfo nodes for each independent
|
2014-05-06 18:12:18 +02:00
|
|
|
* jointree item. These are the components to be joined together.
|
2007-09-26 20:51:51 +02:00
|
|
|
* Note that levels_needed == list_length(initial_rels).
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
1996-07-09 08:22:35 +02:00
|
|
|
* Returns the final level of join relations, i.e., the relation that is
|
1997-12-21 06:18:48 +01:00
|
|
|
* the result of joining all the original relations together.
|
2007-09-26 20:51:51 +02:00
|
|
|
* At least one implementation path must be provided for this relation and
|
|
|
|
* all required sub-relations.
|
|
|
|
*
|
|
|
|
* To support loadable plugins that modify planner behavior by changing the
|
|
|
|
* join searching algorithm, we provide a hook variable that lets a plugin
|
|
|
|
* replace or supplement this function. Any such hook must return the same
|
|
|
|
* final join relation as the standard code would, but it might have a
|
|
|
|
* different set of implementation paths attached, and only the sub-joinrels
|
|
|
|
* needed for these paths need have been instantiated.
|
|
|
|
*
|
|
|
|
* Note to plugin authors: the functions invoked during standard_join_search()
|
2014-05-06 18:12:18 +02:00
|
|
|
* modify root->join_rel_list and root->join_rel_hash. If you want to do more
|
2007-09-26 20:51:51 +02:00
|
|
|
* than one join-order search, you'll probably need to save and restore the
|
|
|
|
* original states of those data structures. See geqo_eval() for an example.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2007-09-26 20:51:51 +02:00
|
|
|
RelOptInfo *
|
|
|
|
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2000-02-07 05:41:04 +01:00
|
|
|
int lev;
|
1998-08-07 07:02:32 +02:00
|
|
|
RelOptInfo *rel;
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2009-11-28 01:46:19 +01:00
|
|
|
/*
|
|
|
|
* This function cannot be invoked recursively within any one planning
|
|
|
|
* problem, so join_rel_level[] can't be in use already.
|
|
|
|
*/
|
|
|
|
Assert(root->join_rel_level == NULL);
|
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* We employ a simple "dynamic programming" algorithm: we first find all
|
|
|
|
* ways to build joins of two jointree items, then all ways to build joins
|
|
|
|
* of three items (from two-item joins and single items), then four-item
|
|
|
|
* joins, and so on until we have considered all ways to join all the
|
|
|
|
* items into one rel.
|
2000-09-12 23:07:18 +02:00
|
|
|
*
|
2009-11-28 01:46:19 +01:00
|
|
|
* root->join_rel_level[j] is a list of all the j-item rels. Initially we
|
|
|
|
* set root->join_rel_level[1] to represent all the single-jointree-item
|
|
|
|
* relations.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2009-11-28 01:46:19 +01:00
|
|
|
root->join_rel_level = (List **) palloc0((levels_needed + 1) * sizeof(List *));
|
2000-09-12 23:07:18 +02:00
|
|
|
|
2009-11-28 01:46:19 +01:00
|
|
|
root->join_rel_level[1] = initial_rels;
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
for (lev = 2; lev <= levels_needed; lev++)
|
1998-02-26 05:46:47 +01:00
|
|
|
{
|
2009-11-28 01:46:19 +01:00
|
|
|
ListCell *lc;
|
1999-05-25 18:15:34 +02:00
|
|
|
|
1997-12-21 06:18:48 +01:00
|
|
|
/*
|
|
|
|
* Determine all possible pairs of relations to be joined at this
|
2000-02-07 05:41:04 +01:00
|
|
|
* level, and build paths for making each one from every available
|
2000-09-12 23:07:18 +02:00
|
|
|
* pair of lower-level relations.
|
1997-12-21 06:18:48 +01:00
|
|
|
*/
|
2009-11-28 01:46:19 +01:00
|
|
|
join_search_one_level(root, lev);
|
1998-02-26 05:46:47 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2018-04-26 20:47:16 +02:00
|
|
|
* Run generate_partitionwise_join_paths() and generate_gather_paths()
|
|
|
|
* for each just-processed joinrel. We could not do this earlier
|
|
|
|
* because both regular and partial paths can get added to a
|
|
|
|
* particular joinrel at multiple times within join_search_one_level.
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
*
|
|
|
|
* After that, we're done creating paths for the joinrel, so run
|
|
|
|
* set_cheapest().
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2009-11-28 01:46:19 +01:00
|
|
|
foreach(lc, root->join_rel_level[lev])
|
2000-02-07 05:41:04 +01:00
|
|
|
{
|
2009-11-28 01:46:19 +01:00
|
|
|
rel = (RelOptInfo *) lfirst(lc);
|
1999-02-18 01:49:48 +01:00
|
|
|
|
2018-02-16 16:33:59 +01:00
|
|
|
/* Create paths for partitionwise joins. */
|
|
|
|
generate_partitionwise_join_paths(root, rel);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
|
2018-03-12 21:45:15 +01:00
|
|
|
/*
|
|
|
|
* Except for the topmost scan/join rel, consider gathering
|
|
|
|
* partial paths. We'll do the same for the topmost scan/join rel
|
|
|
|
* once we know the final targetlist (see grouping_planner).
|
|
|
|
*/
|
|
|
|
if (lev < levels_needed)
|
2020-04-07 16:43:18 +02:00
|
|
|
generate_useful_gather_paths(root, rel, false);
|
2016-04-30 18:29:21 +02:00
|
|
|
|
2000-02-15 21:49:31 +01:00
|
|
|
/* Find and save the cheapest paths for this rel */
|
|
|
|
set_cheapest(rel);
|
1999-02-14 05:57:02 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
#ifdef OPTIMIZER_DEBUG
|
1997-12-21 06:18:48 +01:00
|
|
|
debug_print_rel(root, rel);
|
1997-09-07 07:04:48 +02:00
|
|
|
#endif
|
1997-12-21 06:18:48 +01:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
1997-12-21 06:18:48 +01:00
|
|
|
|
2000-02-07 05:41:04 +01:00
|
|
|
/*
|
2000-09-29 20:21:41 +02:00
|
|
|
* We should have a single rel at the final level.
|
2000-02-07 05:41:04 +01:00
|
|
|
*/
|
2009-11-28 01:46:19 +01:00
|
|
|
if (root->join_rel_level[levels_needed] == NIL)
|
2003-12-17 18:07:48 +01:00
|
|
|
elog(ERROR, "failed to build any %d-way joins", levels_needed);
|
2009-11-28 01:46:19 +01:00
|
|
|
Assert(list_length(root->join_rel_level[levels_needed]) == 1);
|
|
|
|
|
|
|
|
rel = (RelOptInfo *) linitial(root->join_rel_level[levels_needed]);
|
2000-09-29 20:21:41 +02:00
|
|
|
|
2009-11-28 01:46:19 +01:00
|
|
|
root->join_rel_level = NULL;
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
return rel;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*****************************************************************************
|
2002-08-29 18:03:49 +02:00
|
|
|
* PUSHING QUALS DOWN INTO SUBQUERIES
|
|
|
|
*****************************************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* subquery_is_pushdown_safe - is a subquery safe for pushing down quals?
|
|
|
|
*
|
|
|
|
* subquery is the particular component query being checked. topquery
|
|
|
|
* is the top component of a set-operations tree (the same Query if no
|
|
|
|
* set-op is involved).
|
|
|
|
*
|
|
|
|
* Conditions checked here:
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2003-03-22 02:49:38 +01:00
|
|
|
* 1. If the subquery has a LIMIT clause, we must not push down any quals,
|
|
|
|
* since that could change the set of rows returned.
|
2002-08-29 18:03:49 +02:00
|
|
|
*
|
2014-06-28 08:08:08 +02:00
|
|
|
* 2. If the subquery contains EXCEPT or EXCEPT ALL set ops we cannot push
|
2009-10-12 20:10:51 +02:00
|
|
|
* quals into it, because that could change the results.
|
2003-04-25 01:43:09 +02:00
|
|
|
*
|
2014-06-28 08:08:08 +02:00
|
|
|
* 3. If the subquery uses DISTINCT, we cannot push volatile quals into it.
|
2014-06-27 20:08:48 +02:00
|
|
|
* This is because upper-level quals should semantically be evaluated only
|
|
|
|
* once per distinct row, not once per original row, and if the qual is
|
|
|
|
* volatile then extra evaluations could change the results. (This issue
|
|
|
|
* does not apply to other forms of aggregation such as GROUP BY, because
|
|
|
|
* when those are present we push into HAVING not WHERE, so that the quals
|
|
|
|
* are still applied after aggregation.)
|
|
|
|
*
|
2014-06-28 08:08:08 +02:00
|
|
|
* 4. If the subquery contains window functions, we cannot push volatile quals
|
|
|
|
* into it. The issue here is a bit different from DISTINCT: a volatile qual
|
|
|
|
* might succeed for some rows of a window partition and fail for others,
|
|
|
|
* thereby changing the partition contents and thus the window functions'
|
|
|
|
* results for rows that remain.
|
|
|
|
*
|
Disallow pushing volatile quals past set-returning functions.
Pushing an upper-level restriction clause into an unflattened
subquery-in-FROM is okay when the subquery contains no SRFs in its
targetlist, or when it does but the SRFs are unreferenced by the clause
*and the clause is not volatile*. Otherwise, we're changing the number
of times the clause is evaluated, which is bad for volatile quals, and
possibly changing the result, since a volatile qual might succeed for some
SRF output rows and not others despite not referencing any of the changing
columns. (Indeed, if the clause is something like "random() > 0.5", the
user is probably expecting exactly that behavior.)
We had most of these restrictions down, but not the one about the upper
clause not being volatile. Fix that, and add a regression test to
illustrate the expected behavior.
Although this is definitely a bug, it doesn't seem like back-patch
material, since possibly some users don't realize that the broken
behavior is broken and are relying on what happens now. Also, while
the added test is quite cheap in the wake of commit a4c35ea1c, it would
be much more expensive (or else messier) in older branches.
Per report from Tom van Tilburg.
Discussion: <CAP3PPDiucxYCNev52=YPVkrQAPVF1C5PFWnrQPT7iMzO1fiKFQ@mail.gmail.com>
2016-09-28 00:43:36 +02:00
|
|
|
* 5. If the subquery contains any set-returning functions in its targetlist,
|
|
|
|
* we cannot push volatile quals into it. That would push them below the SRFs
|
|
|
|
* and thereby change the number of times they are evaluated. Also, a
|
|
|
|
* volatile qual could succeed for some SRF output rows and fail for others,
|
|
|
|
* a behavior that cannot occur if it's evaluated before SRF expansion.
|
|
|
|
*
|
2014-06-27 20:08:48 +02:00
|
|
|
* In addition, we make several checks on the subquery's output columns to see
|
|
|
|
* if it is safe to reference them in pushed-down quals. If output column k
|
|
|
|
* is found to be unsafe to reference, we set safetyInfo->unsafeColumns[k]
|
2017-08-16 06:22:32 +02:00
|
|
|
* to true, but we don't reject the subquery overall since column k might not
|
2014-06-27 20:08:48 +02:00
|
|
|
* be referenced by some/all quals. The unsafeColumns[] array will be
|
|
|
|
* consulted later by qual_is_pushdown_safe(). It's better to do it this way
|
|
|
|
* than to make the checks directly in qual_is_pushdown_safe(), because when
|
|
|
|
* the subquery involves set operations we have to check the output
|
2013-06-06 05:44:02 +02:00
|
|
|
* expressions in each arm of the set op.
|
2014-06-27 20:08:48 +02:00
|
|
|
*
|
|
|
|
* Note: pushing quals into a DISTINCT subquery is theoretically dubious:
|
|
|
|
* we're effectively assuming that the quals cannot distinguish values that
|
|
|
|
* the DISTINCT's equality operator sees as equal, yet there are many
|
|
|
|
* counterexamples to that assumption. However use of such a qual with a
|
|
|
|
* DISTINCT subquery would be unsafe anyway, since there's no guarantee which
|
|
|
|
* "equal" value will be chosen as the output value by the DISTINCT operation.
|
|
|
|
* So we don't worry too much about that. Another objection is that if the
|
|
|
|
* qual is expensive to evaluate, running it for each original row might cost
|
|
|
|
* more than we save by eliminating rows before the DISTINCT step. But it
|
|
|
|
* would be very hard to estimate that at this stage, and in practice pushdown
|
|
|
|
* seldom seems to make things worse, so we ignore that problem too.
|
2014-06-28 08:08:08 +02:00
|
|
|
*
|
|
|
|
* Note: likewise, pushing quals into a subquery with window functions is a
|
|
|
|
* bit dubious: the quals might remove some rows of a window partition while
|
|
|
|
* leaving others, causing changes in the window functions' results for the
|
|
|
|
* surviving rows. We insist that such a qual reference only partitioning
|
|
|
|
* columns, but again that only protects us if the qual does not distinguish
|
|
|
|
* values that the partitioning equality operator sees as equal. The risks
|
|
|
|
* here are perhaps larger than for DISTINCT, since no de-duplication of rows
|
|
|
|
* occurs and thus there is no theoretical problem with such a qual. But
|
|
|
|
* we'll do this anyway because the potential performance benefits are very
|
|
|
|
* large, and we've seen no field complaints about the longstanding comparable
|
|
|
|
* behavior with DISTINCT.
|
2002-08-29 18:03:49 +02:00
|
|
|
*/
|
|
|
|
static bool
|
2003-04-25 01:43:09 +02:00
|
|
|
subquery_is_pushdown_safe(Query *subquery, Query *topquery,
|
2014-06-27 20:08:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo)
|
2002-08-29 18:03:49 +02:00
|
|
|
{
|
|
|
|
SetOperationStmt *topop;
|
|
|
|
|
2003-03-22 02:49:38 +01:00
|
|
|
/* Check point 1 */
|
|
|
|
if (subquery->limitOffset != NULL || subquery->limitCount != NULL)
|
2002-08-29 18:03:49 +02:00
|
|
|
return false;
|
|
|
|
|
Disallow pushing volatile quals past set-returning functions.
Pushing an upper-level restriction clause into an unflattened
subquery-in-FROM is okay when the subquery contains no SRFs in its
targetlist, or when it does but the SRFs are unreferenced by the clause
*and the clause is not volatile*. Otherwise, we're changing the number
of times the clause is evaluated, which is bad for volatile quals, and
possibly changing the result, since a volatile qual might succeed for some
SRF output rows and not others despite not referencing any of the changing
columns. (Indeed, if the clause is something like "random() > 0.5", the
user is probably expecting exactly that behavior.)
We had most of these restrictions down, but not the one about the upper
clause not being volatile. Fix that, and add a regression test to
illustrate the expected behavior.
Although this is definitely a bug, it doesn't seem like back-patch
material, since possibly some users don't realize that the broken
behavior is broken and are relying on what happens now. Also, while
the added test is quite cheap in the wake of commit a4c35ea1c, it would
be much more expensive (or else messier) in older branches.
Per report from Tom van Tilburg.
Discussion: <CAP3PPDiucxYCNev52=YPVkrQAPVF1C5PFWnrQPT7iMzO1fiKFQ@mail.gmail.com>
2016-09-28 00:43:36 +02:00
|
|
|
/* Check points 3, 4, and 5 */
|
|
|
|
if (subquery->distinctClause ||
|
|
|
|
subquery->hasWindowFuncs ||
|
|
|
|
subquery->hasTargetSRFs)
|
2014-06-27 20:08:48 +02:00
|
|
|
safetyInfo->unsafeVolatile = true;
|
|
|
|
|
2013-06-06 05:44:02 +02:00
|
|
|
/*
|
|
|
|
* If we're at a leaf query, check for unsafe expressions in its target
|
|
|
|
* list, and mark any unsafe ones in unsafeColumns[]. (Non-leaf nodes in
|
|
|
|
* setop trees have only simple Vars in their tlists, so no need to check
|
|
|
|
* them.)
|
|
|
|
*/
|
|
|
|
if (subquery->setOperations == NULL)
|
2014-06-27 20:08:48 +02:00
|
|
|
check_output_expressions(subquery, safetyInfo);
|
2013-06-06 05:44:02 +02:00
|
|
|
|
2002-08-29 18:03:49 +02:00
|
|
|
/* Are we at top level, or looking at a setop component? */
|
|
|
|
if (subquery == topquery)
|
|
|
|
{
|
|
|
|
/* Top level, so check any component queries */
|
|
|
|
if (subquery->setOperations != NULL)
|
2003-04-25 01:43:09 +02:00
|
|
|
if (!recurse_pushdown_safe(subquery->setOperations, topquery,
|
2014-06-27 20:08:48 +02:00
|
|
|
safetyInfo))
|
2002-08-29 18:03:49 +02:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Setop component must not have more components (too weird) */
|
|
|
|
if (subquery->setOperations != NULL)
|
|
|
|
return false;
|
2003-04-25 01:43:09 +02:00
|
|
|
/* Check whether setop component output types match top level */
|
2017-02-21 17:33:07 +01:00
|
|
|
topop = castNode(SetOperationStmt, topquery->setOperations);
|
|
|
|
Assert(topop);
|
2003-04-25 01:43:09 +02:00
|
|
|
compare_tlist_datatypes(subquery->targetList,
|
|
|
|
topop->colTypes,
|
2014-06-27 20:08:48 +02:00
|
|
|
safetyInfo);
|
2002-08-29 18:03:49 +02:00
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Helper routine to recurse through setOperations tree
|
|
|
|
*/
|
|
|
|
static bool
|
2003-04-25 01:43:09 +02:00
|
|
|
recurse_pushdown_safe(Node *setOp, Query *topquery,
|
2014-06-27 20:08:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo)
|
2002-08-29 18:03:49 +02:00
|
|
|
{
|
|
|
|
if (IsA(setOp, RangeTblRef))
|
|
|
|
{
|
|
|
|
RangeTblRef *rtr = (RangeTblRef *) setOp;
|
|
|
|
RangeTblEntry *rte = rt_fetch(rtr->rtindex, topquery->rtable);
|
|
|
|
Query *subquery = rte->subquery;
|
|
|
|
|
|
|
|
Assert(subquery != NULL);
|
2014-06-27 20:08:48 +02:00
|
|
|
return subquery_is_pushdown_safe(subquery, topquery, safetyInfo);
|
2002-08-29 18:03:49 +02:00
|
|
|
}
|
|
|
|
else if (IsA(setOp, SetOperationStmt))
|
|
|
|
{
|
|
|
|
SetOperationStmt *op = (SetOperationStmt *) setOp;
|
|
|
|
|
2014-06-28 08:08:08 +02:00
|
|
|
/* EXCEPT is no good (point 2 for subquery_is_pushdown_safe) */
|
2002-08-29 18:03:49 +02:00
|
|
|
if (op->op == SETOP_EXCEPT)
|
|
|
|
return false;
|
|
|
|
/* Else recurse */
|
2014-06-27 20:08:48 +02:00
|
|
|
if (!recurse_pushdown_safe(op->larg, topquery, safetyInfo))
|
2002-08-29 18:03:49 +02:00
|
|
|
return false;
|
2014-06-27 20:08:48 +02:00
|
|
|
if (!recurse_pushdown_safe(op->rarg, topquery, safetyInfo))
|
2002-08-29 18:03:49 +02:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2003-07-25 02:01:09 +02:00
|
|
|
elog(ERROR, "unrecognized node type: %d",
|
2002-08-29 18:03:49 +02:00
|
|
|
(int) nodeTag(setOp));
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2003-04-25 01:43:09 +02:00
|
|
|
/*
|
2013-06-06 05:44:02 +02:00
|
|
|
* check_output_expressions - check subquery's output expressions for safety
|
|
|
|
*
|
|
|
|
* There are several cases in which it's unsafe to push down an upper-level
|
2014-05-06 18:12:18 +02:00
|
|
|
* qual if it references a particular output column of a subquery. We check
|
2017-08-16 06:22:32 +02:00
|
|
|
* each output column of the subquery and set unsafeColumns[k] to true if
|
2013-06-06 05:44:02 +02:00
|
|
|
* that column is unsafe for a pushed-down qual to reference. The conditions
|
|
|
|
* checked here are:
|
|
|
|
*
|
|
|
|
* 1. We must not push down any quals that refer to subselect outputs that
|
|
|
|
* return sets, else we'd introduce functions-returning-sets into the
|
|
|
|
* subquery's WHERE/HAVING quals.
|
|
|
|
*
|
|
|
|
* 2. We must not push down any quals that refer to subselect outputs that
|
|
|
|
* contain volatile functions, for fear of introducing strange results due
|
|
|
|
* to multiple evaluation of a volatile function.
|
|
|
|
*
|
|
|
|
* 3. If the subquery uses DISTINCT ON, we must not push down any quals that
|
|
|
|
* refer to non-DISTINCT output columns, because that could change the set
|
|
|
|
* of rows returned. (This condition is vacuous for DISTINCT, because then
|
2014-06-27 20:08:48 +02:00
|
|
|
* there are no non-DISTINCT output columns, so we needn't check. Note that
|
|
|
|
* subquery_is_pushdown_safe already reported that we can't use volatile
|
|
|
|
* quals if there's DISTINCT or DISTINCT ON.)
|
2014-06-28 08:08:08 +02:00
|
|
|
*
|
|
|
|
* 4. If the subquery has any window functions, we must not push down quals
|
|
|
|
* that reference any output columns that are not listed in all the subquery's
|
|
|
|
* window PARTITION BY clauses. We can push down quals that use only
|
|
|
|
* partitioning columns because they should succeed or fail identically for
|
|
|
|
* every row of any one window partition, and totally excluding some
|
|
|
|
* partitions will not change a window function's results for remaining
|
|
|
|
* partitions. (Again, this also requires nonvolatile quals, but
|
|
|
|
* subquery_is_pushdown_safe handles that.)
|
2013-06-06 05:44:02 +02:00
|
|
|
*/
|
|
|
|
static void
|
2014-06-27 20:08:48 +02:00
|
|
|
check_output_expressions(Query *subquery, pushdown_safety_info *safetyInfo)
|
2013-06-06 05:44:02 +02:00
|
|
|
{
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, subquery->targetList)
|
|
|
|
{
|
|
|
|
TargetEntry *tle = (TargetEntry *) lfirst(lc);
|
|
|
|
|
|
|
|
if (tle->resjunk)
|
|
|
|
continue; /* ignore resjunk columns */
|
|
|
|
|
|
|
|
/* We need not check further if output col is already known unsafe */
|
2014-06-27 20:08:48 +02:00
|
|
|
if (safetyInfo->unsafeColumns[tle->resno])
|
2013-06-06 05:44:02 +02:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Functions returning sets are unsafe (point 1) */
|
2016-09-13 19:54:24 +02:00
|
|
|
if (subquery->hasTargetSRFs &&
|
|
|
|
expression_returns_set((Node *) tle->expr))
|
2013-06-06 05:44:02 +02:00
|
|
|
{
|
2014-06-27 20:08:48 +02:00
|
|
|
safetyInfo->unsafeColumns[tle->resno] = true;
|
2013-06-06 05:44:02 +02:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Volatile functions are unsafe (point 2) */
|
|
|
|
if (contain_volatile_functions((Node *) tle->expr))
|
|
|
|
{
|
2014-06-27 20:08:48 +02:00
|
|
|
safetyInfo->unsafeColumns[tle->resno] = true;
|
2013-06-06 05:44:02 +02:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If subquery uses DISTINCT ON, check point 3 */
|
|
|
|
if (subquery->hasDistinctOn &&
|
|
|
|
!targetIsInSortList(tle, InvalidOid, subquery->distinctClause))
|
|
|
|
{
|
|
|
|
/* non-DISTINCT column, so mark it unsafe */
|
2014-06-27 20:08:48 +02:00
|
|
|
safetyInfo->unsafeColumns[tle->resno] = true;
|
2013-06-06 05:44:02 +02:00
|
|
|
continue;
|
|
|
|
}
|
2014-06-28 08:08:08 +02:00
|
|
|
|
|
|
|
/* If subquery uses window functions, check point 4 */
|
|
|
|
if (subquery->hasWindowFuncs &&
|
|
|
|
!targetIsInAllPartitionLists(tle, subquery))
|
|
|
|
{
|
|
|
|
/* not present in all PARTITION BY clauses, so mark it unsafe */
|
|
|
|
safetyInfo->unsafeColumns[tle->resno] = true;
|
|
|
|
continue;
|
|
|
|
}
|
2013-06-06 05:44:02 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For subqueries using UNION/UNION ALL/INTERSECT/INTERSECT ALL, we can
|
|
|
|
* push quals into each component query, but the quals can only reference
|
|
|
|
* subquery columns that suffer no type coercions in the set operation.
|
|
|
|
* Otherwise there are possible semantic gotchas. So, we check the
|
|
|
|
* component queries to see if any of them have output types different from
|
|
|
|
* the top-level setop outputs. unsafeColumns[k] is set true if column k
|
|
|
|
* has different type in any component.
|
2006-08-10 04:36:29 +02:00
|
|
|
*
|
|
|
|
* We don't have to care about typmods here: the only allowed difference
|
|
|
|
* between set-op input and output typmods is input is a specific typmod
|
|
|
|
* and output is -1, and that does not require a coercion.
|
2013-06-06 05:44:02 +02:00
|
|
|
*
|
|
|
|
* tlist is a subquery tlist.
|
|
|
|
* colTypes is an OID list of the top-level setop's output column types.
|
2014-06-27 20:08:48 +02:00
|
|
|
* safetyInfo->unsafeColumns[] is the result array.
|
2003-04-25 01:43:09 +02:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
compare_tlist_datatypes(List *tlist, List *colTypes,
|
2014-06-27 20:08:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo)
|
2003-04-25 01:43:09 +02:00
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *l;
|
|
|
|
ListCell *colType = list_head(colTypes);
|
2003-04-25 01:43:09 +02:00
|
|
|
|
2004-05-26 06:41:50 +02:00
|
|
|
foreach(l, tlist)
|
2003-04-25 01:43:09 +02:00
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
TargetEntry *tle = (TargetEntry *) lfirst(l);
|
2003-04-25 01:43:09 +02:00
|
|
|
|
2005-04-06 18:34:07 +02:00
|
|
|
if (tle->resjunk)
|
2003-04-25 01:43:09 +02:00
|
|
|
continue; /* ignore resjunk columns */
|
2004-05-26 06:41:50 +02:00
|
|
|
if (colType == NULL)
|
2003-04-25 01:43:09 +02:00
|
|
|
elog(ERROR, "wrong number of tlist entries");
|
2005-04-06 18:34:07 +02:00
|
|
|
if (exprType((Node *) tle->expr) != lfirst_oid(colType))
|
2014-06-27 20:08:48 +02:00
|
|
|
safetyInfo->unsafeColumns[tle->resno] = true;
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
colType = lnext(colTypes, colType);
|
2003-04-25 01:43:09 +02:00
|
|
|
}
|
2004-05-26 06:41:50 +02:00
|
|
|
if (colType != NULL)
|
2003-04-25 01:43:09 +02:00
|
|
|
elog(ERROR, "wrong number of tlist entries");
|
|
|
|
}
|
|
|
|
|
2014-06-28 08:08:08 +02:00
|
|
|
/*
|
|
|
|
* targetIsInAllPartitionLists
|
|
|
|
* True if the TargetEntry is listed in the PARTITION BY clause
|
|
|
|
* of every window defined in the query.
|
|
|
|
*
|
|
|
|
* It would be safe to ignore windows not actually used by any window
|
|
|
|
* function, but it's not easy to get that info at this stage; and it's
|
|
|
|
* unlikely to be useful to spend any extra cycles getting it, since
|
|
|
|
* unreferenced window definitions are probably infrequent in practice.
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
targetIsInAllPartitionLists(TargetEntry *tle, Query *query)
|
|
|
|
{
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, query->windowClause)
|
|
|
|
{
|
|
|
|
WindowClause *wc = (WindowClause *) lfirst(lc);
|
|
|
|
|
|
|
|
if (!targetIsInSortList(tle, InvalidOid, wc->partitionClause))
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2003-03-22 02:49:38 +01:00
|
|
|
/*
|
|
|
|
* qual_is_pushdown_safe - is a particular qual safe to push down?
|
|
|
|
*
|
|
|
|
* qual is a restriction clause applying to the given subquery (whose RTE
|
|
|
|
* has index rti in the parent query).
|
|
|
|
*
|
|
|
|
* Conditions checked here:
|
|
|
|
*
|
2017-11-29 05:32:17 +01:00
|
|
|
* 1. The qual must not contain any SubPlans (mainly because I'm not sure
|
|
|
|
* it will work correctly: SubLinks will already have been transformed into
|
|
|
|
* SubPlans in the qual, but not in the subquery). Note that SubLinks that
|
|
|
|
* transform to initplans are safe, and will be accepted here because what
|
|
|
|
* we'll see in the qual is just a Param referencing the initplan output.
|
2003-03-22 02:49:38 +01:00
|
|
|
*
|
2014-06-27 20:08:48 +02:00
|
|
|
* 2. If unsafeVolatile is set, the qual must not contain any volatile
|
|
|
|
* functions.
|
|
|
|
*
|
Improve qual pushdown for RLS and SB views
The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view. This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.
As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks. That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.
Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.
2015-04-27 18:29:42 +02:00
|
|
|
* 3. If unsafeLeaky is set, the qual must not contain any leaky functions
|
|
|
|
* that are passed Var nodes, and therefore might reveal values from the
|
|
|
|
* subquery as side effects.
|
2014-06-27 20:08:48 +02:00
|
|
|
*
|
|
|
|
* 4. The qual must not refer to the whole-row output of the subquery
|
2006-02-13 17:22:23 +01:00
|
|
|
* (since there is no easy way to name that within the subquery itself).
|
|
|
|
*
|
2014-06-27 20:08:48 +02:00
|
|
|
* 5. The qual must not refer to any subquery output columns that were
|
2013-06-06 05:44:02 +02:00
|
|
|
* found to be unsafe to reference by subquery_is_pushdown_safe().
|
2003-03-22 02:49:38 +01:00
|
|
|
*/
|
|
|
|
static bool
|
2003-04-25 01:43:09 +02:00
|
|
|
qual_is_pushdown_safe(Query *subquery, Index rti, Node *qual,
|
2014-06-27 20:08:48 +02:00
|
|
|
pushdown_safety_info *safetyInfo)
|
2003-03-22 02:49:38 +01:00
|
|
|
{
|
|
|
|
bool safe = true;
|
|
|
|
List *vars;
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *vl;
|
2003-03-22 02:49:38 +01:00
|
|
|
|
|
|
|
/* Refuse subselects (point 1) */
|
|
|
|
if (contain_subplans(qual))
|
|
|
|
return false;
|
|
|
|
|
2014-06-27 20:08:48 +02:00
|
|
|
/* Refuse volatile quals if we found they'd be unsafe (point 2) */
|
|
|
|
if (safetyInfo->unsafeVolatile &&
|
|
|
|
contain_volatile_functions(qual))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* Refuse leaky quals if told to (point 3) */
|
|
|
|
if (safetyInfo->unsafeLeaky &&
|
Improve qual pushdown for RLS and SB views
The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view. This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.
As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks. That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.
Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.
2015-04-27 18:29:42 +02:00
|
|
|
contain_leaked_vars(qual))
|
2014-06-27 20:08:48 +02:00
|
|
|
return false;
|
|
|
|
|
2008-12-28 19:54:01 +01:00
|
|
|
/*
|
2009-06-11 16:49:15 +02:00
|
|
|
* It would be unsafe to push down window function calls, but at least for
|
2014-05-06 18:12:18 +02:00
|
|
|
* the moment we could never see any in a qual anyhow. (The same applies
|
Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.
Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.
Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.
Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.
2011-07-13 00:23:55 +02:00
|
|
|
* to aggregates, which we check for in pull_var_clause below.)
|
2008-12-28 19:54:01 +01:00
|
|
|
*/
|
|
|
|
Assert(!contain_window_function(qual));
|
|
|
|
|
2003-03-22 02:49:38 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Examine all Vars used in clause; since it's a restriction clause, all
|
|
|
|
* such Vars must refer to subselect output columns.
|
2003-03-22 02:49:38 +01:00
|
|
|
*/
|
2016-03-10 21:52:58 +01:00
|
|
|
vars = pull_var_clause(qual, PVC_INCLUDE_PLACEHOLDERS);
|
2003-03-22 18:11:25 +01:00
|
|
|
foreach(vl, vars)
|
2003-03-22 02:49:38 +01:00
|
|
|
{
|
2003-08-04 02:43:34 +02:00
|
|
|
Var *var = (Var *) lfirst(vl);
|
2003-03-22 02:49:38 +01:00
|
|
|
|
2008-10-21 22:42:53 +02:00
|
|
|
/*
|
|
|
|
* XXX Punt if we find any PlaceHolderVars in the restriction clause.
|
|
|
|
* It's not clear whether a PHV could safely be pushed down, and even
|
|
|
|
* less clear whether such a situation could arise in any cases of
|
|
|
|
* practical interest anyway. So for the moment, just refuse to push
|
|
|
|
* down.
|
|
|
|
*/
|
|
|
|
if (!IsA(var, Var))
|
|
|
|
{
|
|
|
|
safe = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2003-03-22 02:49:38 +01:00
|
|
|
Assert(var->varno == rti);
|
2013-06-06 05:44:02 +02:00
|
|
|
Assert(var->varattno >= 0);
|
2003-08-04 02:43:34 +02:00
|
|
|
|
2014-06-27 20:08:48 +02:00
|
|
|
/* Check point 4 */
|
2006-02-13 17:22:23 +01:00
|
|
|
if (var->varattno == 0)
|
|
|
|
{
|
|
|
|
safe = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2014-06-27 20:08:48 +02:00
|
|
|
/* Check point 5 */
|
|
|
|
if (safetyInfo->unsafeColumns[var->varattno])
|
2006-08-19 04:48:53 +02:00
|
|
|
{
|
|
|
|
safe = false;
|
|
|
|
break;
|
|
|
|
}
|
2003-03-22 02:49:38 +01:00
|
|
|
}
|
|
|
|
|
2004-05-31 01:40:41 +02:00
|
|
|
list_free(vars);
|
2003-03-22 02:49:38 +01:00
|
|
|
|
|
|
|
return safe;
|
|
|
|
}
|
|
|
|
|
2002-08-29 18:03:49 +02:00
|
|
|
/*
|
|
|
|
* subquery_push_qual - push down a qual that we have determined is safe
|
|
|
|
*/
|
|
|
|
static void
|
2005-06-04 21:19:42 +02:00
|
|
|
subquery_push_qual(Query *subquery, RangeTblEntry *rte, Index rti, Node *qual)
|
2002-08-29 18:03:49 +02:00
|
|
|
{
|
|
|
|
if (subquery->setOperations != NULL)
|
|
|
|
{
|
|
|
|
/* Recurse to push it separately to each component query */
|
2004-08-19 22:57:41 +02:00
|
|
|
recurse_push_qual(subquery->setOperations, subquery,
|
2005-06-04 21:19:42 +02:00
|
|
|
rte, rti, qual);
|
2002-08-29 18:03:49 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* We need to replace Vars in the qual (which must refer to outputs of
|
|
|
|
* the subquery) with copies of the subquery's targetlist expressions.
|
|
|
|
* Note that at this point, any uplevel Vars in the qual should have
|
|
|
|
* been replaced with Params, so they need no work.
|
2002-08-29 18:03:49 +02:00
|
|
|
*
|
2002-09-04 22:31:48 +02:00
|
|
|
* This step also ensures that when we are pushing into a setop tree,
|
|
|
|
* each component query gets its own copy of the qual.
|
2002-08-29 18:03:49 +02:00
|
|
|
*/
|
2012-11-08 22:52:49 +01:00
|
|
|
qual = ReplaceVarsFromTargetList(qual, rti, 0, rte,
|
|
|
|
subquery->targetList,
|
|
|
|
REPLACEVARS_REPORT_ERROR, 0,
|
|
|
|
&subquery->hasSubLinks);
|
2005-03-11 00:21:26 +01:00
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Now attach the qual to the proper place: normally WHERE, but if the
|
|
|
|
* subquery uses grouping or aggregation, put it in HAVING (since the
|
|
|
|
* qual really refers to the group-result rows).
|
2005-03-11 00:21:26 +01:00
|
|
|
*/
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
if (subquery->hasAggs || subquery->groupClause || subquery->groupingSets || subquery->havingQual)
|
2005-03-11 00:21:26 +01:00
|
|
|
subquery->havingQual = make_and_qual(subquery->havingQual, qual);
|
|
|
|
else
|
|
|
|
subquery->jointree->quals =
|
|
|
|
make_and_qual(subquery->jointree->quals, qual);
|
2002-08-29 18:03:49 +02:00
|
|
|
|
|
|
|
/*
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 18:58:20 +01:00
|
|
|
* We need not change the subquery's hasAggs or hasSubLinks flags,
|
2005-10-15 04:49:52 +02:00
|
|
|
* since we can't be pushing down any aggregates that weren't there
|
|
|
|
* before, and we don't push down subselects at all.
|
2002-08-29 18:03:49 +02:00
|
|
|
*/
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Helper routine to recurse through setOperations tree
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
recurse_push_qual(Node *setOp, Query *topquery,
|
2005-06-04 21:19:42 +02:00
|
|
|
RangeTblEntry *rte, Index rti, Node *qual)
|
2002-08-29 18:03:49 +02:00
|
|
|
{
|
|
|
|
if (IsA(setOp, RangeTblRef))
|
|
|
|
{
|
|
|
|
RangeTblRef *rtr = (RangeTblRef *) setOp;
|
2004-05-11 00:44:49 +02:00
|
|
|
RangeTblEntry *subrte = rt_fetch(rtr->rtindex, topquery->rtable);
|
|
|
|
Query *subquery = subrte->subquery;
|
2002-08-29 18:03:49 +02:00
|
|
|
|
|
|
|
Assert(subquery != NULL);
|
2005-06-04 21:19:42 +02:00
|
|
|
subquery_push_qual(subquery, rte, rti, qual);
|
2002-08-29 18:03:49 +02:00
|
|
|
}
|
|
|
|
else if (IsA(setOp, SetOperationStmt))
|
|
|
|
{
|
|
|
|
SetOperationStmt *op = (SetOperationStmt *) setOp;
|
|
|
|
|
2005-06-04 21:19:42 +02:00
|
|
|
recurse_push_qual(op->larg, topquery, rte, rti, qual);
|
|
|
|
recurse_push_qual(op->rarg, topquery, rte, rti, qual);
|
2002-08-29 18:03:49 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2003-07-25 02:01:09 +02:00
|
|
|
elog(ERROR, "unrecognized node type: %d",
|
2002-08-29 18:03:49 +02:00
|
|
|
(int) nodeTag(setOp));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
/*****************************************************************************
|
|
|
|
* SIMPLIFYING SUBQUERY TARGETLISTS
|
|
|
|
*****************************************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* remove_unused_subquery_outputs
|
|
|
|
* Remove subquery targetlist items we don't need
|
|
|
|
*
|
|
|
|
* It's possible, even likely, that the upper query does not read all the
|
|
|
|
* output columns of the subquery. We can remove any such outputs that are
|
|
|
|
* not needed by the subquery itself (e.g., as sort/group columns) and do not
|
|
|
|
* affect semantics otherwise (e.g., volatile functions can't be removed).
|
|
|
|
* This is useful not only because we might be able to remove expensive-to-
|
|
|
|
* compute expressions, but because deletion of output columns might allow
|
|
|
|
* optimizations such as join removal to occur within the subquery.
|
|
|
|
*
|
|
|
|
* To avoid affecting column numbering in the targetlist, we don't physically
|
|
|
|
* remove unused tlist entries, but rather replace their expressions with NULL
|
|
|
|
* constants. This is implemented by modifying subquery->targetList.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel)
|
|
|
|
{
|
|
|
|
Bitmapset *attrs_used = NULL;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Do nothing if subquery has UNION/INTERSECT/EXCEPT: in principle we
|
|
|
|
* could update all the child SELECTs' tlists, but it seems not worth the
|
|
|
|
* trouble presently.
|
|
|
|
*/
|
|
|
|
if (subquery->setOperations)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If subquery has regular DISTINCT (not DISTINCT ON), we're wasting our
|
|
|
|
* time: all its output columns must be used in the distinctClause.
|
|
|
|
*/
|
|
|
|
if (subquery->distinctClause && !subquery->hasDistinctOn)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Collect a bitmap of all the output column numbers used by the upper
|
|
|
|
* query.
|
|
|
|
*
|
|
|
|
* Add all the attributes needed for joins or final output. Note: we must
|
Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-19 02:01:49 +01:00
|
|
|
* look at rel's targetlist, not the attr_needed data, because attr_needed
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
* isn't computed for inheritance child rels, cf set_append_rel_size().
|
|
|
|
* (XXX might be worth changing that sometime.)
|
|
|
|
*/
|
2016-03-14 21:59:59 +01:00
|
|
|
pull_varattnos((Node *) rel->reltarget->exprs, rel->relid, &attrs_used);
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
|
|
|
|
/* Add all the attributes used by un-pushed-down restriction clauses. */
|
|
|
|
foreach(lc, rel->baserestrictinfo)
|
|
|
|
{
|
|
|
|
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
|
|
|
|
|
|
|
|
pull_varattnos((Node *) rinfo->clause, rel->relid, &attrs_used);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If there's a whole-row reference to the subquery, we can't remove
|
|
|
|
* anything.
|
|
|
|
*/
|
|
|
|
if (bms_is_member(0 - FirstLowInvalidHeapAttributeNumber, attrs_used))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Run through the tlist and zap entries we don't need. It's okay to
|
|
|
|
* modify the tlist items in-place because set_subquery_pathlist made a
|
|
|
|
* copy of the subquery.
|
|
|
|
*/
|
|
|
|
foreach(lc, subquery->targetList)
|
|
|
|
{
|
|
|
|
TargetEntry *tle = (TargetEntry *) lfirst(lc);
|
2014-06-12 23:11:53 +02:00
|
|
|
Node *texpr = (Node *) tle->expr;
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If it has a sortgroupref number, it's used in some sort/group
|
|
|
|
* clause so we'd better not remove it. Also, don't remove any
|
|
|
|
* resjunk columns, since their reason for being has nothing to do
|
|
|
|
* with anybody reading the subquery's output. (It's likely that
|
|
|
|
* resjunk columns in a sub-SELECT would always have ressortgroupref
|
|
|
|
* set, but even if they don't, it seems imprudent to remove them.)
|
|
|
|
*/
|
|
|
|
if (tle->ressortgroupref || tle->resjunk)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If it's used by the upper query, we can't remove it.
|
|
|
|
*/
|
|
|
|
if (bms_is_member(tle->resno - FirstLowInvalidHeapAttributeNumber,
|
|
|
|
attrs_used))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If it contains a set-returning function, we can't remove it since
|
|
|
|
* that could change the number of rows returned by the subquery.
|
|
|
|
*/
|
2016-09-13 19:54:24 +02:00
|
|
|
if (subquery->hasTargetSRFs &&
|
|
|
|
expression_returns_set(texpr))
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If it contains volatile functions, we daren't remove it for fear
|
|
|
|
* that the user is expecting their side-effects to happen.
|
|
|
|
*/
|
2014-06-12 23:11:53 +02:00
|
|
|
if (contain_volatile_functions(texpr))
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* OK, we don't need it. Replace the expression with a NULL constant.
|
2014-06-12 23:11:53 +02:00
|
|
|
* Preserve the exposed type of the expression, in case something
|
|
|
|
* looks at the rowtype of the subquery's result.
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
*/
|
2014-06-12 23:11:53 +02:00
|
|
|
tle->expr = (Expr *) makeNullConst(exprType(texpr),
|
|
|
|
exprTypmod(texpr),
|
|
|
|
exprCollation(texpr));
|
Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 19:12:53 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
/*
|
|
|
|
* create_partial_bitmap_paths
|
|
|
|
* Build partial bitmap heap path for the relation
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
|
|
|
|
Path *bitmapqual)
|
|
|
|
{
|
|
|
|
int parallel_workers;
|
|
|
|
double pages_fetched;
|
|
|
|
|
|
|
|
/* Compute heap pages for bitmap heap scan */
|
|
|
|
pages_fetched = compute_bitmap_pages(root, rel, bitmapqual, 1.0,
|
|
|
|
NULL, NULL);
|
|
|
|
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
parallel_workers = compute_parallel_worker(rel, pages_fetched, -1,
|
|
|
|
max_parallel_workers_per_gather);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
|
|
|
|
if (parallel_workers <= 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
add_partial_path(rel, (Path *) create_bitmap_heap_path(root, rel,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
bitmapqual, rel->lateral_relids, 1.0, parallel_workers));
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
}
|
|
|
|
|
2017-01-18 19:50:35 +01:00
|
|
|
/*
|
|
|
|
* Compute the number of parallel workers that should be used to scan a
|
Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 19:37:24 +01:00
|
|
|
* relation. We compute the parallel workers based on the size of the heap to
|
|
|
|
* be scanned and the size of the index to be scanned, then choose a minimum
|
|
|
|
* of those.
|
|
|
|
*
|
2017-03-14 19:33:14 +01:00
|
|
|
* "heap_pages" is the number of pages from the table that we expect to scan, or
|
|
|
|
* -1 if we don't expect to scan any.
|
|
|
|
*
|
|
|
|
* "index_pages" is the number of pages from the index that we expect to scan, or
|
|
|
|
* -1 if we don't expect to scan any.
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
*
|
|
|
|
* "max_workers" is caller's limit on the number of workers. This typically
|
|
|
|
* comes from a GUC.
|
2017-01-18 19:50:35 +01:00
|
|
|
*/
|
2017-02-15 19:53:24 +01:00
|
|
|
int
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
|
|
|
|
int max_workers)
|
2017-01-18 19:50:35 +01:00
|
|
|
{
|
Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 19:37:24 +01:00
|
|
|
int parallel_workers = 0;
|
2017-01-18 19:50:35 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the user has set the parallel_workers reloption, use that; otherwise
|
|
|
|
* select a default number of workers.
|
|
|
|
*/
|
|
|
|
if (rel->rel_parallel_workers != -1)
|
|
|
|
parallel_workers = rel->rel_parallel_workers;
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2017-03-14 19:33:14 +01:00
|
|
|
* If the number of pages being scanned is insufficient to justify a
|
|
|
|
* parallel scan, just return zero ... unless it's an inheritance
|
|
|
|
* child. In that case, we want to generate a parallel path here
|
|
|
|
* anyway. It might not be worthwhile just for this relation, but
|
|
|
|
* when combined with all of its inheritance siblings it may well pay
|
|
|
|
* off.
|
2017-01-18 19:50:35 +01:00
|
|
|
*/
|
2017-03-14 19:33:14 +01:00
|
|
|
if (rel->reloptkind == RELOPT_BASEREL &&
|
|
|
|
((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:35:54 +02:00
|
|
|
(index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
|
2017-01-18 19:50:35 +01:00
|
|
|
return 0;
|
|
|
|
|
2017-03-14 19:33:14 +01:00
|
|
|
if (heap_pages >= 0)
|
Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 19:37:24 +01:00
|
|
|
{
|
2017-03-14 19:33:14 +01:00
|
|
|
int heap_parallel_threshold;
|
|
|
|
int heap_parallel_workers = 1;
|
|
|
|
|
Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 19:37:24 +01:00
|
|
|
/*
|
|
|
|
* Select the number of workers based on the log of the size of
|
|
|
|
* the relation. This probably needs to be a good deal more
|
|
|
|
* sophisticated, but we need something here for now. Note that
|
|
|
|
* the upper limit of the min_parallel_table_scan_size GUC is
|
|
|
|
* chosen to prevent overflow here.
|
|
|
|
*/
|
|
|
|
heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
|
|
|
|
while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
|
|
|
|
{
|
|
|
|
heap_parallel_workers++;
|
|
|
|
heap_parallel_threshold *= 3;
|
|
|
|
if (heap_parallel_threshold > INT_MAX / 3)
|
|
|
|
break; /* avoid overflow */
|
|
|
|
}
|
|
|
|
|
|
|
|
parallel_workers = heap_parallel_workers;
|
|
|
|
}
|
|
|
|
|
2017-03-14 19:33:14 +01:00
|
|
|
if (index_pages >= 0)
|
2017-01-18 19:50:35 +01:00
|
|
|
{
|
2017-03-14 19:33:14 +01:00
|
|
|
int index_parallel_workers = 1;
|
|
|
|
int index_parallel_threshold;
|
|
|
|
|
Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 19:37:24 +01:00
|
|
|
/* same calculation as for heap_pages above */
|
|
|
|
index_parallel_threshold = Max(min_parallel_index_scan_size, 1);
|
|
|
|
while (index_pages >= (BlockNumber) (index_parallel_threshold * 3))
|
|
|
|
{
|
|
|
|
index_parallel_workers++;
|
|
|
|
index_parallel_threshold *= 3;
|
|
|
|
if (index_parallel_threshold > INT_MAX / 3)
|
|
|
|
break; /* avoid overflow */
|
|
|
|
}
|
|
|
|
|
|
|
|
if (parallel_workers > 0)
|
|
|
|
parallel_workers = Min(parallel_workers, index_parallel_workers);
|
|
|
|
else
|
|
|
|
parallel_workers = index_parallel_workers;
|
2017-01-18 19:50:35 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
/* In no case use more than caller supplied maximum number of workers */
|
|
|
|
parallel_workers = Min(parallel_workers, max_workers);
|
2017-01-18 19:50:35 +01:00
|
|
|
|
|
|
|
return parallel_workers;
|
|
|
|
}
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/*
|
2018-02-16 16:33:59 +01:00
|
|
|
* generate_partitionwise_join_paths
|
|
|
|
* Create paths representing partitionwise join for given partitioned
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
* join relation.
|
|
|
|
*
|
|
|
|
* This must not be called until after we are done adding paths for all
|
|
|
|
* child-joins. Otherwise, add_path might delete a path to which some path
|
|
|
|
* generated here has a reference.
|
|
|
|
*/
|
|
|
|
void
|
2018-02-16 16:33:59 +01:00
|
|
|
generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
{
|
|
|
|
List *live_children = NIL;
|
|
|
|
int cnt_parts;
|
|
|
|
int num_parts;
|
|
|
|
RelOptInfo **part_rels;
|
|
|
|
|
|
|
|
/* Handle only join relations here. */
|
|
|
|
if (!IS_JOIN_REL(rel))
|
|
|
|
return;
|
|
|
|
|
2018-02-05 23:31:57 +01:00
|
|
|
/* We've nothing to do if the relation is not partitioned. */
|
|
|
|
if (!IS_PARTITIONED_REL(rel))
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
return;
|
|
|
|
|
Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the
child's tlist by applying adjust_appendrel_attrs() to the parent's. So in
the case where the parent's included a whole-row Var for the parent, the
child's contained a ConvertRowtypeExpr. To cope with that, that commit
added code to the planner, such as setrefs.c, but some code paths still
assumed that the tlist for a scan (or join) rel would only include Vars
and PlaceHolderVars, which was true before that commit, causing errors:
* When creating an explicit sort node for an input path for a mergejoin
path for a child join, prepare_sort_from_pathkeys() threw the 'could not
find pathkey item to sort' error.
* When deparsing a relation participating in a pushed down child join as a
subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw
the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan generated by
contrib/postgres_fdw for EvalPlanQual support for a pushed down child
join, fix_join_expr() threw the 'variable not found in subplan target
lists' error.
To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me. While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child not to violate the planner assumption, and tries to fix it up later,
But both approaches need more work, so refuse to generate partitionwise
join paths when whole-row Vars are involved, instead. We don't need to
handle ConvertRowtypeExprs in the child's tlists for now, so this commit
also removes the changes to the planner.
Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlist using that data, which also
required an extra step for adding PlaceHolderVars to that tlist, but it
would be more efficient to build it from the parent join's tlist through
the adjust_appendrel_attrs() transformation. So this commit builds that
list that way, and simplifies build_joinrel_tlist() and placeholder.c as
well as part of set_append_rel_size() to basically what they were before
partitionwise join went in.
Back-patch to PG11 where partitionwise join was introduced.
Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also
provided some of regression tests. Patch by me, reviewed by Robert Haas.
Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
2018-08-31 13:34:06 +02:00
|
|
|
/* The relation should have consider_partitionwise_join set. */
|
|
|
|
Assert(rel->consider_partitionwise_join);
|
|
|
|
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
/* Guard against stack overflow due to overly deep partition hierarchy. */
|
|
|
|
check_stack_depth();
|
|
|
|
|
|
|
|
num_parts = rel->nparts;
|
|
|
|
part_rels = rel->part_rels;
|
|
|
|
|
|
|
|
/* Collect non-dummy child-joins. */
|
|
|
|
for (cnt_parts = 0; cnt_parts < num_parts; cnt_parts++)
|
|
|
|
{
|
|
|
|
RelOptInfo *child_rel = part_rels[cnt_parts];
|
|
|
|
|
Avoid crash in partitionwise join planning under GEQO.
While trying to plan a partitionwise join, we may be faced with cases
where one or both input partitions for a particular segment of the join
have been pruned away. In HEAD and v11, this is problematic because
earlier processing didn't bother to make a pruned RelOptInfo fully
valid. With an upcoming patch to make partition pruning more efficient,
this'll be even more problematic because said RelOptInfo won't exist at
all.
The existing code attempts to deal with this by retroactively making the
RelOptInfo fully valid, but that causes crashes under GEQO because join
planning is done in a short-lived memory context. In v11 we could
probably have fixed this by switching to the planner's main context
while fixing up the RelOptInfo, but that idea doesn't scale well to the
upcoming patch. It would be better not to mess with the base-relation
data structures during join planning, anyway --- that's just a recipe
for order-of-operations bugs.
In many cases, though, we don't actually need the child RelOptInfo,
because if the input is certainly empty then the join segment's result
is certainly empty, so we can skip making a join plan altogether. (The
existing code ultimately arrives at the same conclusion, but only after
doing a lot more work.) This approach works except when the pruned-away
partition is on the nullable side of a LEFT, ANTI, or FULL join, and the
other side isn't pruned. But in those cases the existing code leaves a
lot to be desired anyway --- the correct output is just the result of
the unpruned side of the join, but we were emitting a useless outer join
against a dummy Result. Pending somebody writing code to handle that
more nicely, let's just abandon the partitionwise-join optimization in
such cases.
When the modified code skips making a join plan, it doesn't make a
join RelOptInfo either; this requires some upper-level code to
cope with nulls in part_rels[] arrays. We would have had to have
that anyway after the upcoming patch.
Back-patch to v11 since the crash is demonstrable there.
Discussion: https://postgr.es/m/8305.1553884377@sss.pgh.pa.us
2019-03-30 17:48:19 +01:00
|
|
|
/* If it's been pruned entirely, it's certainly dummy. */
|
|
|
|
if (child_rel == NULL)
|
|
|
|
continue;
|
2018-02-05 23:31:57 +01:00
|
|
|
|
2018-02-16 16:33:59 +01:00
|
|
|
/* Add partitionwise join paths for partitioned child-joins. */
|
|
|
|
generate_partitionwise_join_paths(root, child_rel);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
|
Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows). That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top. In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.
Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered. But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.
The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).
While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately. That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point. (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)
It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix. Hopefully there are
few such extensions.
Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.
Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.
In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.
I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.
In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.
Back-patch as appropriate into v11 and v10.
Tom Lane and Julien Rouhaud
Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 20:21:52 +01:00
|
|
|
set_cheapest(child_rel);
|
|
|
|
|
2017-10-19 13:58:30 +02:00
|
|
|
/* Dummy children will not be scanned, so ignore those. */
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
if (IS_DUMMY_REL(child_rel))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
#ifdef OPTIMIZER_DEBUG
|
2017-12-12 16:52:15 +01:00
|
|
|
debug_print_rel(root, child_rel);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
#endif
|
|
|
|
|
|
|
|
live_children = lappend(live_children, child_rel);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If all child-joins are dummy, parent join is also dummy. */
|
|
|
|
if (!live_children)
|
|
|
|
{
|
|
|
|
mark_dummy_rel(rel);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Build additional paths for this rel from child-join paths. */
|
|
|
|
add_paths_to_append_rel(root, rel, live_children);
|
|
|
|
list_free(live_children);
|
|
|
|
}
|
|
|
|
|
2017-01-18 19:50:35 +01:00
|
|
|
|
2002-08-29 18:03:49 +02:00
|
|
|
/*****************************************************************************
|
|
|
|
* DEBUG SUPPORT
|
1996-07-09 08:22:35 +02:00
|
|
|
*****************************************************************************/
|
|
|
|
|
1996-11-10 04:06:38 +01:00
|
|
|
#ifdef OPTIMIZER_DEBUG
|
2000-02-07 05:41:04 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
static void
|
2016-04-30 20:08:00 +02:00
|
|
|
print_relids(PlannerInfo *root, Relids relids)
|
2001-10-18 18:11:42 +02:00
|
|
|
{
|
2003-02-08 21:20:55 +01:00
|
|
|
int x;
|
|
|
|
bool first = true;
|
2001-10-18 18:11:42 +02:00
|
|
|
|
2014-11-28 19:37:25 +01:00
|
|
|
x = -1;
|
|
|
|
while ((x = bms_next_member(relids, x)) >= 0)
|
2001-10-18 18:11:42 +02:00
|
|
|
{
|
2003-02-08 21:20:55 +01:00
|
|
|
if (!first)
|
2001-10-18 18:11:42 +02:00
|
|
|
printf(" ");
|
2016-04-30 20:08:00 +02:00
|
|
|
if (x < root->simple_rel_array_size &&
|
|
|
|
root->simple_rte_array[x])
|
|
|
|
printf("%s", root->simple_rte_array[x]->eref->aliasname);
|
|
|
|
else
|
|
|
|
printf("%d", x);
|
2003-02-08 21:20:55 +01:00
|
|
|
first = false;
|
2001-10-18 18:11:42 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2005-06-06 00:32:58 +02:00
|
|
|
print_restrictclauses(PlannerInfo *root, List *clauses)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *l;
|
1996-07-09 08:22:35 +02:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
foreach(l, clauses)
|
|
|
|
{
|
1999-02-03 21:15:53 +01:00
|
|
|
RestrictInfo *c = lfirst(l);
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2005-06-06 00:32:58 +02:00
|
|
|
print_expr((Node *) c->clause, root->parse->rtable);
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
if (lnext(clauses, l))
|
2001-10-18 18:11:42 +02:00
|
|
|
printf(", ");
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
1996-11-10 04:06:38 +01:00
|
|
|
static void
|
2005-06-06 00:32:58 +02:00
|
|
|
print_path(PlannerInfo *root, Path *path, int indent)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-10-25 07:50:21 +02:00
|
|
|
const char *ptype;
|
2002-11-30 06:21:03 +01:00
|
|
|
bool join = false;
|
|
|
|
Path *subpath = NULL;
|
1997-09-08 04:41:22 +02:00
|
|
|
int i;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
|
|
|
switch (nodeTag(path))
|
|
|
|
{
|
1997-09-08 04:41:22 +02:00
|
|
|
case T_Path:
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
switch (path->pathtype)
|
|
|
|
{
|
|
|
|
case T_SeqScan:
|
|
|
|
ptype = "SeqScan";
|
|
|
|
break;
|
|
|
|
case T_SampleScan:
|
|
|
|
ptype = "SampleScan";
|
|
|
|
break;
|
|
|
|
case T_FunctionScan:
|
|
|
|
ptype = "FunctionScan";
|
|
|
|
break;
|
2017-03-08 16:39:37 +01:00
|
|
|
case T_TableFuncScan:
|
|
|
|
ptype = "TableFuncScan";
|
|
|
|
break;
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
case T_ValuesScan:
|
|
|
|
ptype = "ValuesScan";
|
|
|
|
break;
|
|
|
|
case T_CteScan:
|
|
|
|
ptype = "CteScan";
|
|
|
|
break;
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
case T_NamedTuplestoreScan:
|
|
|
|
ptype = "NamedTuplestoreScan";
|
|
|
|
break;
|
|
|
|
case T_Result:
|
|
|
|
ptype = "Result";
|
|
|
|
break;
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
case T_WorkTableScan:
|
|
|
|
ptype = "WorkTableScan";
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
ptype = "???Path";
|
|
|
|
break;
|
|
|
|
}
|
1997-09-08 04:41:22 +02:00
|
|
|
break;
|
|
|
|
case T_IndexPath:
|
|
|
|
ptype = "IdxScan";
|
|
|
|
break;
|
2005-04-20 00:35:18 +02:00
|
|
|
case T_BitmapHeapPath:
|
|
|
|
ptype = "BitmapHeapScan";
|
|
|
|
break;
|
2005-04-21 21:18:13 +02:00
|
|
|
case T_BitmapAndPath:
|
|
|
|
ptype = "BitmapAndPath";
|
|
|
|
break;
|
|
|
|
case T_BitmapOrPath:
|
|
|
|
ptype = "BitmapOrPath";
|
|
|
|
break;
|
2001-10-18 18:11:42 +02:00
|
|
|
case T_TidPath:
|
|
|
|
ptype = "TidScan";
|
|
|
|
break;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
case T_SubqueryScanPath:
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
ptype = "SubqueryScan";
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
break;
|
2011-02-20 06:17:18 +01:00
|
|
|
case T_ForeignPath:
|
|
|
|
ptype = "ForeignScan";
|
|
|
|
break;
|
2018-07-19 02:54:39 +02:00
|
|
|
case T_CustomPath:
|
|
|
|
ptype = "CustomScan";
|
|
|
|
break;
|
|
|
|
case T_NestPath:
|
|
|
|
ptype = "NestLoop";
|
|
|
|
join = true;
|
|
|
|
break;
|
|
|
|
case T_MergePath:
|
|
|
|
ptype = "MergeJoin";
|
|
|
|
join = true;
|
|
|
|
break;
|
|
|
|
case T_HashPath:
|
|
|
|
ptype = "HashJoin";
|
|
|
|
join = true;
|
|
|
|
break;
|
2002-11-06 01:00:45 +01:00
|
|
|
case T_AppendPath:
|
|
|
|
ptype = "Append";
|
|
|
|
break;
|
2010-10-14 22:56:39 +02:00
|
|
|
case T_MergeAppendPath:
|
|
|
|
ptype = "MergeAppend";
|
|
|
|
break;
|
In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner. It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it. prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer. We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about. Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.
For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall. However testing says that the
penalty is very small, close to the noise level. In more complex queries,
this is able to find optimizations that we could not find before.
The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before). To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)
Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.
Patch by me, reviewed by David Rowley and Mark Dilger
Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 23:54:10 +01:00
|
|
|
case T_GroupResultPath:
|
|
|
|
ptype = "GroupResult";
|
2002-11-30 06:21:03 +01:00
|
|
|
break;
|
|
|
|
case T_MaterialPath:
|
|
|
|
ptype = "Material";
|
|
|
|
subpath = ((MaterialPath *) path)->subpath;
|
2002-11-06 01:00:45 +01:00
|
|
|
break;
|
2003-01-20 19:55:07 +01:00
|
|
|
case T_UniquePath:
|
|
|
|
ptype = "Unique";
|
|
|
|
subpath = ((UniquePath *) path)->subpath;
|
|
|
|
break;
|
2015-12-02 14:19:50 +01:00
|
|
|
case T_GatherPath:
|
|
|
|
ptype = "Gather";
|
|
|
|
subpath = ((GatherPath *) path)->subpath;
|
|
|
|
break;
|
2018-07-19 02:54:39 +02:00
|
|
|
case T_GatherMergePath:
|
|
|
|
ptype = "GatherMerge";
|
|
|
|
subpath = ((GatherMergePath *) path)->subpath;
|
|
|
|
break;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
case T_ProjectionPath:
|
|
|
|
ptype = "Projection";
|
|
|
|
subpath = ((ProjectionPath *) path)->subpath;
|
|
|
|
break;
|
Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-18 21:46:50 +01:00
|
|
|
case T_ProjectSetPath:
|
|
|
|
ptype = "ProjectSet";
|
|
|
|
subpath = ((ProjectSetPath *) path)->subpath;
|
|
|
|
break;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
case T_SortPath:
|
|
|
|
ptype = "Sort";
|
|
|
|
subpath = ((SortPath *) path)->subpath;
|
|
|
|
break;
|
Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.
This has a number of benefits:
- Reduced memory consumption, because only a single group (determined by
values in the sorted prefix) needs to be kept in memory. This may also
eliminate the need to spill to disk.
- Lower startup cost, because Incremental Sort produce results after each
prefix group, which is beneficial for plans where startup cost matters
(like for example queries with LIMIT clause).
We consider both Sort and Incremental Sort, and decide based on costing.
The implemented algorithm operates in two different modes:
- Fetching a minimum number of tuples without check of equality on the
prefix keys, and sorting on all columns when safe.
- Fetching all tuples for a single prefix group and then sorting by
comparing only the remaining (non-prefix) keys.
We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.
This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.
There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.
Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:33:28 +02:00
|
|
|
case T_IncrementalSortPath:
|
|
|
|
ptype = "IncrementalSort";
|
|
|
|
subpath = ((SortPath *) path)->subpath;
|
|
|
|
break;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
case T_GroupPath:
|
|
|
|
ptype = "Group";
|
|
|
|
subpath = ((GroupPath *) path)->subpath;
|
|
|
|
break;
|
|
|
|
case T_UpperUniquePath:
|
|
|
|
ptype = "UpperUnique";
|
|
|
|
subpath = ((UpperUniquePath *) path)->subpath;
|
|
|
|
break;
|
|
|
|
case T_AggPath:
|
|
|
|
ptype = "Agg";
|
|
|
|
subpath = ((AggPath *) path)->subpath;
|
|
|
|
break;
|
|
|
|
case T_GroupingSetsPath:
|
|
|
|
ptype = "GroupingSets";
|
|
|
|
subpath = ((GroupingSetsPath *) path)->subpath;
|
|
|
|
break;
|
|
|
|
case T_MinMaxAggPath:
|
|
|
|
ptype = "MinMaxAgg";
|
|
|
|
break;
|
|
|
|
case T_WindowAggPath:
|
|
|
|
ptype = "WindowAgg";
|
|
|
|
subpath = ((WindowAggPath *) path)->subpath;
|
|
|
|
break;
|
|
|
|
case T_SetOpPath:
|
|
|
|
ptype = "SetOp";
|
|
|
|
subpath = ((SetOpPath *) path)->subpath;
|
|
|
|
break;
|
|
|
|
case T_RecursiveUnionPath:
|
|
|
|
ptype = "RecursiveUnion";
|
|
|
|
break;
|
|
|
|
case T_LockRowsPath:
|
|
|
|
ptype = "LockRows";
|
|
|
|
subpath = ((LockRowsPath *) path)->subpath;
|
|
|
|
break;
|
|
|
|
case T_ModifyTablePath:
|
|
|
|
ptype = "ModifyTable";
|
|
|
|
break;
|
|
|
|
case T_LimitPath:
|
|
|
|
ptype = "Limit";
|
|
|
|
subpath = ((LimitPath *) path)->subpath;
|
|
|
|
break;
|
1997-09-08 04:41:22 +02:00
|
|
|
default:
|
2001-10-18 18:11:42 +02:00
|
|
|
ptype = "???Path";
|
1997-09-08 04:41:22 +02:00
|
|
|
break;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
2001-10-18 18:11:42 +02:00
|
|
|
|
|
|
|
for (i = 0; i < indent; i++)
|
|
|
|
printf("\t");
|
2002-11-06 01:00:45 +01:00
|
|
|
printf("%s", ptype);
|
|
|
|
|
|
|
|
if (path->parent)
|
|
|
|
{
|
|
|
|
printf("(");
|
2016-04-30 20:08:00 +02:00
|
|
|
print_relids(root, path->parent->relids);
|
|
|
|
printf(")");
|
|
|
|
}
|
|
|
|
if (path->param_info)
|
|
|
|
{
|
|
|
|
printf(" required_outer (");
|
|
|
|
print_relids(root, path->param_info->ppi_req_outer);
|
|
|
|
printf(")");
|
2002-11-06 01:00:45 +01:00
|
|
|
}
|
2016-04-30 20:08:00 +02:00
|
|
|
printf(" rows=%.0f cost=%.2f..%.2f\n",
|
|
|
|
path->rows, path->startup_cost, path->total_cost);
|
2001-10-18 18:11:42 +02:00
|
|
|
|
|
|
|
if (path->pathkeys)
|
|
|
|
{
|
|
|
|
for (i = 0; i < indent; i++)
|
|
|
|
printf("\t");
|
|
|
|
printf(" pathkeys: ");
|
2005-06-06 00:32:58 +02:00
|
|
|
print_pathkeys(path->pathkeys, root->parse->rtable);
|
2001-10-18 18:11:42 +02:00
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
if (join)
|
|
|
|
{
|
2001-10-18 18:11:42 +02:00
|
|
|
JoinPath *jp = (JoinPath *) path;
|
2000-02-15 21:49:31 +01:00
|
|
|
|
2001-10-18 18:11:42 +02:00
|
|
|
for (i = 0; i < indent; i++)
|
|
|
|
printf("\t");
|
|
|
|
printf(" clauses: ");
|
|
|
|
print_restrictclauses(root, jp->joinrestrictinfo);
|
|
|
|
printf("\n");
|
2000-02-15 21:49:31 +01:00
|
|
|
|
2002-11-06 01:00:45 +01:00
|
|
|
if (IsA(path, MergePath))
|
2000-02-15 21:49:31 +01:00
|
|
|
{
|
2001-10-18 18:11:42 +02:00
|
|
|
MergePath *mp = (MergePath *) path;
|
2000-02-15 21:49:31 +01:00
|
|
|
|
2009-11-15 03:45:35 +01:00
|
|
|
for (i = 0; i < indent; i++)
|
|
|
|
printf("\t");
|
|
|
|
printf(" sortouter=%d sortinner=%d materializeinner=%d\n",
|
|
|
|
((mp->outersortkeys) ? 1 : 0),
|
|
|
|
((mp->innersortkeys) ? 1 : 0),
|
|
|
|
((mp->materialize_inner) ? 1 : 0));
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
2001-10-18 18:11:42 +02:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
print_path(root, jp->outerjoinpath, indent + 1);
|
|
|
|
print_path(root, jp->innerjoinpath, indent + 1);
|
|
|
|
}
|
2002-11-30 06:21:03 +01:00
|
|
|
|
|
|
|
if (subpath)
|
|
|
|
print_path(root, subpath, indent + 1);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2001-10-18 18:11:42 +02:00
|
|
|
void
|
2005-06-06 00:32:58 +02:00
|
|
|
debug_print_rel(PlannerInfo *root, RelOptInfo *rel)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
ListCell *l;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2001-10-18 18:11:42 +02:00
|
|
|
printf("RELOPTINFO (");
|
2016-04-30 20:08:00 +02:00
|
|
|
print_relids(root, rel->relids);
|
2016-03-14 21:59:59 +01:00
|
|
|
printf("): rows=%.0f width=%d\n", rel->rows, rel->reltarget->width);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2001-10-18 18:11:42 +02:00
|
|
|
if (rel->baserestrictinfo)
|
|
|
|
{
|
|
|
|
printf("\tbaserestrictinfo: ");
|
|
|
|
print_restrictclauses(root, rel->baserestrictinfo);
|
|
|
|
printf("\n");
|
|
|
|
}
|
|
|
|
|
2005-06-09 06:19:00 +02:00
|
|
|
if (rel->joininfo)
|
2001-10-18 18:11:42 +02:00
|
|
|
{
|
2005-06-09 06:19:00 +02:00
|
|
|
printf("\tjoininfo: ");
|
|
|
|
print_restrictclauses(root, rel->joininfo);
|
2001-10-18 18:11:42 +02:00
|
|
|
printf("\n");
|
|
|
|
}
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
printf("\tpath list:\n");
|
|
|
|
foreach(l, rel->pathlist)
|
|
|
|
print_path(root, lfirst(l), 1);
|
2016-04-30 20:08:00 +02:00
|
|
|
if (rel->cheapest_parameterized_paths)
|
|
|
|
{
|
|
|
|
printf("\n\tcheapest parameterized paths:\n");
|
|
|
|
foreach(l, rel->cheapest_parameterized_paths)
|
|
|
|
print_path(root, lfirst(l), 1);
|
|
|
|
}
|
2012-08-08 01:02:54 +02:00
|
|
|
if (rel->cheapest_startup_path)
|
|
|
|
{
|
|
|
|
printf("\n\tcheapest startup path:\n");
|
|
|
|
print_path(root, rel->cheapest_startup_path, 1);
|
|
|
|
}
|
|
|
|
if (rel->cheapest_total_path)
|
|
|
|
{
|
|
|
|
printf("\n\tcheapest total path:\n");
|
|
|
|
print_path(root, rel->cheapest_total_path, 1);
|
|
|
|
}
|
2001-10-18 18:11:42 +02:00
|
|
|
printf("\n");
|
|
|
|
fflush(stdout);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
2001-10-28 07:26:15 +01:00
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
#endif /* OPTIMIZER_DEBUG */
|