1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* paths.h
|
2005-06-06 00:32:58 +02:00
|
|
|
* prototypes for various files in optimizer/path
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
2017-01-03 19:48:53 +01:00
|
|
|
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/include/optimizer/paths.h
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
1997-09-07 07:04:48 +02:00
|
|
|
#ifndef PATHS_H
|
|
|
|
#define PATHS_H
|
1996-07-09 08:22:35 +02:00
|
|
|
|
1997-11-26 02:14:33 +01:00
|
|
|
#include "nodes/relation.h"
|
|
|
|
|
2003-01-24 04:58:44 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1999-07-27 08:23:12 +02:00
|
|
|
* allpaths.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2000-01-23 00:50:30 +01:00
|
|
|
extern bool enable_geqo;
|
2003-01-26 00:10:30 +01:00
|
|
|
extern int geqo_threshold;
|
Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 19:37:24 +01:00
|
|
|
extern int min_parallel_table_scan_size;
|
|
|
|
extern int min_parallel_index_scan_size;
|
2014-11-21 20:05:46 +01:00
|
|
|
|
|
|
|
/* Hook for plugins to get control in set_rel_pathlist() */
|
|
|
|
typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
|
2017-06-21 20:39:04 +02:00
|
|
|
RelOptInfo *rel,
|
|
|
|
Index rti,
|
|
|
|
RangeTblEntry *rte);
|
2014-11-21 20:05:46 +01:00
|
|
|
extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
|
2015-05-01 14:50:35 +02:00
|
|
|
|
|
|
|
/* Hook for plugins to get control in add_paths_to_joinrel() */
|
|
|
|
typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
|
2017-06-21 20:39:04 +02:00
|
|
|
RelOptInfo *joinrel,
|
|
|
|
RelOptInfo *outerrel,
|
|
|
|
RelOptInfo *innerrel,
|
|
|
|
JoinType jointype,
|
|
|
|
JoinPathExtraData *extra);
|
2015-05-01 14:50:35 +02:00
|
|
|
extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
|
2000-01-23 00:50:30 +01:00
|
|
|
|
2007-09-26 20:51:51 +02:00
|
|
|
/* Hook for plugins to replace standard_join_search() */
|
2007-11-15 22:14:46 +01:00
|
|
|
typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
|
2017-06-21 20:39:04 +02:00
|
|
|
int levels_needed,
|
|
|
|
List *initial_rels);
|
2007-09-26 20:51:51 +02:00
|
|
|
extern PGDLLIMPORT join_search_hook_type join_search_hook;
|
|
|
|
|
|
|
|
|
2005-12-20 03:30:36 +01:00
|
|
|
extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
extern void set_dummy_rel_pathlist(RelOptInfo *rel);
|
2007-09-26 20:51:51 +02:00
|
|
|
extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
|
2007-11-15 22:14:46 +01:00
|
|
|
List *initial_rels);
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2016-01-20 20:29:22 +01:00
|
|
|
extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
|
2017-03-14 19:33:14 +01:00
|
|
|
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
|
|
|
|
double index_pages);
|
Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 18:05:43 +01:00
|
|
|
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
|
2017-05-17 22:31:56 +02:00
|
|
|
Path *bitmapqual);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
extern void generate_partition_wise_join_paths(PlannerInfo *root,
|
|
|
|
RelOptInfo *rel);
|
2016-01-20 20:29:22 +01:00
|
|
|
|
2001-10-18 18:11:42 +02:00
|
|
|
#ifdef OPTIMIZER_DEBUG
|
2005-06-06 00:32:58 +02:00
|
|
|
extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
|
2001-10-18 18:11:42 +02:00
|
|
|
#endif
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1999-07-27 08:23:12 +02:00
|
|
|
* indxpath.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* routines to generate index paths
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2005-06-06 00:32:58 +02:00
|
|
|
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
|
2009-09-17 22:49:29 +02:00
|
|
|
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
|
2011-10-26 23:52:02 +02:00
|
|
|
List *restrictlist,
|
|
|
|
List *exprlist, List *oprlist);
|
2017-01-15 20:09:35 +01:00
|
|
|
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
|
|
|
|
int indexcol);
|
2005-04-12 01:06:57 +02:00
|
|
|
extern bool match_index_to_operand(Node *operand, int indexcol,
|
|
|
|
IndexOptInfo *index);
|
2011-12-25 01:03:21 +01:00
|
|
|
extern void expand_indexqual_conditions(IndexOptInfo *index,
|
|
|
|
List *indexclauses, List *indexclausecols,
|
|
|
|
List **indexquals_p, List **indexqualcols_p);
|
Support using index-only scans with partial indexes in more cases.
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate. This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all. So index-only scans were
essentially unavailable with such partial indexes.
To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner. This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes. However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes. We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed. We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.
Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me
2016-03-31 20:48:56 +02:00
|
|
|
extern void check_index_predicates(PlannerInfo *root, RelOptInfo *rel);
|
Improve planner's handling of duplicated index column expressions.
It's potentially useful for an index to repeat the same indexable column
or expression in multiple index columns, if the columns have different
opclasses. (If they share opclasses too, the duplicate column is pretty
useless, but nonetheless we've allowed such cases since 9.0.) However,
the planner failed to cope with this, because createplan.c was relying on
simple equal() matching to figure out which index column each index qual
is intended for. We do have that information available upstream in
indxpath.c, though, so the fix is to not flatten the multi-level indexquals
list when putting it into an IndexPath. Then we can rely on the sublist
structure to identify target index columns in createplan.c. There's a
similar issue for index ORDER BYs (the KNNGIST feature), so introduce a
multi-level-list representation for that too. This adds a bit more
representational overhead, but we might more or less buy that back by not
having to search for matching index columns anymore in createplan.c;
likewise btcostestimate saves some cycles.
Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree
index keys must be ordered by attribute" failure shown there, as well as
"operator MMMM is not a member of opfamily NNNN".
Although this is a pre-existing problem that can be demonstrated in 9.0 and
9.1, I'm not going to back-patch it, because the API changes in the planner
seem likely to break things such as index plugins. The corner cases where
this matters seem too narrow to justify possibly breaking things in a minor
release.
2011-12-24 00:44:21 +01:00
|
|
|
extern Expr *adjust_rowcompare_for_index(RowCompareExpr *clause,
|
|
|
|
IndexOptInfo *index,
|
|
|
|
int indexcol,
|
|
|
|
List **indexcolnos,
|
|
|
|
bool *var_on_left_p);
|
1996-07-09 08:22:35 +02:00
|
|
|
|
1999-11-23 21:07:06 +01:00
|
|
|
/*
|
|
|
|
* tidpath.h
|
|
|
|
* routines to generate tid paths
|
|
|
|
*/
|
2005-06-06 00:32:58 +02:00
|
|
|
extern void create_tidscan_paths(PlannerInfo *root, RelOptInfo *rel);
|
1999-11-23 21:07:06 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1999-07-27 08:23:12 +02:00
|
|
|
* joinpath.c
|
1997-09-07 07:04:48 +02:00
|
|
|
* routines to create join paths
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2005-06-06 00:32:58 +02:00
|
|
|
extern void add_paths_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
|
2008-08-14 20:48:00 +02:00
|
|
|
RelOptInfo *outerrel, RelOptInfo *innerrel,
|
|
|
|
JoinType jointype, SpecialJoinInfo *sjinfo,
|
2001-03-22 05:01:46 +01:00
|
|
|
List *restrictlist);
|
2000-02-07 05:41:04 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* joinrels.c
|
|
|
|
* routines to determine which relations to join
|
|
|
|
*/
|
2009-11-28 01:46:19 +01:00
|
|
|
extern void join_search_one_level(PlannerInfo *root, int level);
|
2005-06-06 00:32:58 +02:00
|
|
|
extern RelOptInfo *make_join_rel(PlannerInfo *root,
|
2005-12-20 03:30:36 +01:00
|
|
|
RelOptInfo *rel1, RelOptInfo *rel2);
|
Restructure code that is responsible for ensuring that clauseless joins are
considered when it is necessary to do so because of a join-order restriction
(that is, an outer-join or IN-subselect construct). The former coding was a
bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario
Weilguni's recent bug report. His specific problem was that an IN could be
turned into a "clauseless" join due to constant-propagation removing the IN's
joinclause, and if the IN's subselect involved more than one relation and
there was more than one such IN linking to the same upper relation, then the
only valid join orders involve "bushy" plans but we would fail to consider the
specific paths needed to get there. (See the example case added to the join
regression test.) On examining the code I wonder if there weren't some other
problem cases too; in particular it seems that GEQO was defending against a
different set of corner cases than the main planner was. There was also an
efficiency problem, in that when we did realize we needed a clauseless join
because of an IN, we'd consider clauseless joins against every other relation
whether this was sensible or not. It seems a better design is to use the
outer-join and in-clause lists as a backup heuristic, just as the rule of
joining only where there are joinclauses is a heuristic: we'll join two
relations if they have a usable joinclause *or* this might be necessary to
satisfy an outer-join or IN-clause join order restriction. I refactored the
code to have just one place considering this instead of three, and made sure
that it covered all the cases that any of them had been considering.
Backpatch as far as 8.1 (which has only the IN-clause form of the disease).
By rights 8.0 and 7.4 should have the bug too, but they accidentally fail
to fail, because the joininfo structure used in those releases preserves some
memory of there having once been a joinclause between the inner and outer
sides of an IN, and so it leads the code in the right direction anyway.
I'll be conservative and not touch them.
2007-02-16 01:14:01 +01:00
|
|
|
extern bool have_join_order_restriction(PlannerInfo *root,
|
|
|
|
RelOptInfo *rel1, RelOptInfo *rel2);
|
Still more fixes for planner's handling of LATERAL references.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references. If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too. The planner did not understand this and would generate
intermediate joins that could not be used. While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.
To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent. This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.
Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations. This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.
Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1dd02f1. Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures. The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen. I chose to move the
supporting logic into joinrels.c where the latter functions are.
Back-patch to 9.3 where LATERAL support was introduced.
2015-12-11 20:22:20 +01:00
|
|
|
extern bool have_dangerous_phv(PlannerInfo *root,
|
|
|
|
Relids outer_relids, Relids inner_params);
|
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 17:11:10 +02:00
|
|
|
extern void mark_dummy_rel(RelOptInfo *rel);
|
|
|
|
extern bool have_partkey_equi_join(RelOptInfo *rel1, RelOptInfo *rel2,
|
|
|
|
JoinType jointype, List *restrictlist);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2007-01-20 21:45:41 +01:00
|
|
|
/*
|
|
|
|
* equivclass.c
|
|
|
|
* routines for managing EquivalenceClasses
|
|
|
|
*/
|
2013-03-22 00:43:59 +01:00
|
|
|
typedef bool (*ec_matches_callback_type) (PlannerInfo *root,
|
2017-06-21 20:39:04 +02:00
|
|
|
RelOptInfo *rel,
|
|
|
|
EquivalenceClass *ec,
|
|
|
|
EquivalenceMember *em,
|
|
|
|
void *arg);
|
2013-03-22 00:43:59 +01:00
|
|
|
|
Reduce "X = X" to "X IS NOT NULL", if it's easy to do so.
If the operator is a strict btree equality operator, and X isn't volatile,
then the clause must yield true for any non-null value of X, or null if X
is null. At top level of a WHERE clause, we can ignore the distinction
between false and null results, so it's valid to simplify the clause to
"X IS NOT NULL". This is a useful improvement mainly because we'll get
a far better selectivity estimate in most cases.
Because such cases seldom arise in well-written queries, it is unappetizing
to expend a lot of planner cycles looking for them ... but it turns out
that there's a place we can shoehorn this in practically for free, because
equivclass.c already has to detect and reject candidate equivalences of the
form X = X. That doesn't catch every place that it would be valid to
simplify to X IS NOT NULL, but it catches the typical case. Working harder
doesn't seem justified.
Patch by me, reviewed by Petr Jelinek
Discussion: https://postgr.es/m/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com
2017-10-08 18:23:32 +02:00
|
|
|
extern bool process_equivalence(PlannerInfo *root,
|
|
|
|
RestrictInfo **p_restrictinfo,
|
2007-11-15 22:14:46 +01:00
|
|
|
bool below_outer_join);
|
2011-03-20 01:29:08 +01:00
|
|
|
extern Expr *canonicalize_ec_expression(Expr *expr,
|
2011-04-10 17:42:00 +02:00
|
|
|
Oid req_type, Oid req_collation);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern void reconsider_outer_join_clauses(PlannerInfo *root);
|
|
|
|
extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
|
|
|
|
Expr *expr,
|
Compute correct em_nullable_relids in get_eclass_for_sort_expr().
Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr
must be able to compute valid em_nullable_relids for any new equivalence
class members it creates. I'd worried about this in the commit message
for db9f0e1d9a4a0842c814a464cdc9758c3f20b96c, but claimed that it wasn't a
problem because multi-member ECs should already exist when it runs. That
is transparently wrong, though, because this function is also called by
initialize_mergeclause_eclasses, which runs during deconstruct_jointree.
The example given in the bug report (which the new regression test item
is based upon) fails because the COALESCE() expression is first seen by
initialize_mergeclause_eclasses rather than process_equivalence.
Fixing this requires passing the appropriate nullable_relids set to
get_eclass_for_sort_expr, and it requires new code to compute that set
for top-level expressions such as ORDER BY, GROUP BY, etc. We store
the top-level nullable_relids in a new field in PlannerInfo to avoid
computing it many times. In the back branches, I've added the new
field at the end of the struct to minimize ABI breakage for planner
plugins. There doesn't seem to be a good alternative to changing
get_eclass_for_sort_expr's API signature, though. There probably aren't
any third-party extensions calling that function directly; moreover,
if there are, they probably need to think about what to pass for
nullable_relids anyway.
Back-patch to 9.2, like the previous patch in this area.
2013-11-15 22:46:18 +01:00
|
|
|
Relids nullable_relids,
|
2007-11-08 22:49:48 +01:00
|
|
|
List *opfamilies,
|
2011-03-20 01:29:08 +01:00
|
|
|
Oid opcintype,
|
|
|
|
Oid collation,
|
2010-10-29 17:52:16 +02:00
|
|
|
Index sortref,
|
Revisit handling of UNION ALL subqueries with non-Var output columns.
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
2012-03-16 18:11:12 +01:00
|
|
|
Relids rel,
|
2010-10-29 17:52:16 +02:00
|
|
|
bool create_it);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern void generate_base_implied_equalities(PlannerInfo *root);
|
|
|
|
extern List *generate_join_implied_equalities(PlannerInfo *root,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
Relids join_relids,
|
|
|
|
Relids outer_relids,
|
2007-11-15 22:14:46 +01:00
|
|
|
RelOptInfo *inner_rel);
|
Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like
Nested Loop
-> Scan X
-> Nested Loop
-> Scan Y
-> Scan Z
Filter: Z.Z = X.X
The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node. On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members. So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there. If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.
This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape. (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.) Many
thanks to Alexander Kirkouski for submitting a reproducible test case.
The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-30 02:19:38 +02:00
|
|
|
extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
|
|
|
|
List *eclasses,
|
|
|
|
Relids join_relids,
|
|
|
|
Relids outer_relids,
|
|
|
|
RelOptInfo *inner_rel);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
|
2016-06-18 21:22:34 +02:00
|
|
|
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
|
|
|
|
ForeignKeyOptInfo *fkinfo,
|
|
|
|
int colno);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern void add_child_rel_equivalences(PlannerInfo *root,
|
2007-11-15 22:14:46 +01:00
|
|
|
AppendRelInfo *appinfo,
|
|
|
|
RelOptInfo *parent_rel,
|
|
|
|
RelOptInfo *child_rel);
|
2013-03-22 00:43:59 +01:00
|
|
|
extern List *generate_implied_equalities_for_column(PlannerInfo *root,
|
|
|
|
RelOptInfo *rel,
|
|
|
|
ec_matches_callback_type callback,
|
|
|
|
void *callback_arg,
|
|
|
|
Relids prohibited_rels);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern bool have_relevant_eclass_joinclause(PlannerInfo *root,
|
|
|
|
RelOptInfo *rel1, RelOptInfo *rel2);
|
|
|
|
extern bool has_relevant_eclass_joinclause(PlannerInfo *root,
|
2007-11-15 22:14:46 +01:00
|
|
|
RelOptInfo *rel1);
|
Fix eclass_useful_for_merging to give valid results for appendrel children.
Formerly, this function would always return "true" for an appendrel child
relation, because it would think that the appendrel parent was a potential
join target for the child. In principle that should only lead to some
inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed
that it could lead to "could not find pathkey item to sort" planner errors
in odd corner cases. Specifically, we would think that all columns of a
child table's multicolumn index were interesting pathkeys, causing us to
generate a MergeAppend path that sorts by all the columns. However, if any
of those columns weren't actually used above the level of the appendrel,
they would not get added to that rel's targetlist, which would result in
being unable to resolve the MergeAppend's sort keys against its targetlist
during createplan.c.
Backpatch to 9.3. In older versions, columns of an appendrel get added
to its targetlist even if they're not mentioned above the scan level,
so that the failure doesn't occur. It might be worth back-patching this
fix to older versions anyway, but I'll refrain for the moment.
2015-08-07 02:14:37 +02:00
|
|
|
extern bool eclass_useful_for_merging(PlannerInfo *root,
|
|
|
|
EquivalenceClass *eclass,
|
2007-11-15 22:14:46 +01:00
|
|
|
RelOptInfo *rel);
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 21:52:46 +02:00
|
|
|
extern bool is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist);
|
2007-01-20 21:45:41 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1999-08-16 04:17:58 +02:00
|
|
|
* pathkeys.c
|
|
|
|
* utilities for matching and building path keys
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
1999-08-16 04:17:58 +02:00
|
|
|
typedef enum
|
|
|
|
{
|
2001-10-28 07:26:15 +01:00
|
|
|
PATHKEYS_EQUAL, /* pathkeys are identical */
|
|
|
|
PATHKEYS_BETTER1, /* pathkey 1 is a superset of pathkey 2 */
|
|
|
|
PATHKEYS_BETTER2, /* vice versa */
|
|
|
|
PATHKEYS_DIFFERENT /* neither pathkey includes the other */
|
1999-08-16 04:17:58 +02:00
|
|
|
} PathKeysComparison;
|
1996-07-09 08:22:35 +02:00
|
|
|
|
1999-08-16 04:17:58 +02:00
|
|
|
extern PathKeysComparison compare_pathkeys(List *keys1, List *keys2);
|
|
|
|
extern bool pathkeys_contained_in(List *keys1, List *keys2);
|
1999-08-21 05:49:17 +02:00
|
|
|
extern Path *get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
|
2012-01-28 01:26:38 +01:00
|
|
|
Relids required_outer,
|
2017-03-07 16:33:29 +01:00
|
|
|
CostSelector cost_criterion,
|
|
|
|
bool require_parallel_safe);
|
2000-02-15 21:49:31 +01:00
|
|
|
extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
|
2000-04-12 19:17:23 +02:00
|
|
|
List *pathkeys,
|
2012-01-28 01:26:38 +01:00
|
|
|
Relids required_outer,
|
2000-04-12 19:17:23 +02:00
|
|
|
double fraction);
|
2017-03-07 16:33:29 +01:00
|
|
|
extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
|
2005-06-06 00:32:58 +02:00
|
|
|
extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
|
2007-01-20 21:45:41 +01:00
|
|
|
ScanDirection scandir);
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-22 01:37:02 +01:00
|
|
|
extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
|
|
|
|
Relids nullable_relids, Oid opno,
|
|
|
|
Relids rel, bool create_it);
|
2005-06-06 00:32:58 +02:00
|
|
|
extern List *convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
List *subquery_pathkeys,
|
|
|
|
List *subquery_tlist);
|
2005-06-06 00:32:58 +02:00
|
|
|
extern List *build_join_pathkeys(PlannerInfo *root,
|
2001-03-22 05:01:46 +01:00
|
|
|
RelOptInfo *joinrel,
|
2005-01-23 03:21:36 +01:00
|
|
|
JoinType jointype,
|
2001-03-22 05:01:46 +01:00
|
|
|
List *outer_pathkeys);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern List *make_pathkeys_for_sortclauses(PlannerInfo *root,
|
|
|
|
List *sortclauses,
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
List *tlist);
|
2010-10-29 17:52:16 +02:00
|
|
|
extern void initialize_mergeclause_eclasses(PlannerInfo *root,
|
2011-04-10 17:42:00 +02:00
|
|
|
RestrictInfo *restrictinfo);
|
2010-10-29 17:52:16 +02:00
|
|
|
extern void update_mergeclause_eclasses(PlannerInfo *root,
|
2011-04-10 17:42:00 +02:00
|
|
|
RestrictInfo *restrictinfo);
|
2005-06-06 00:32:58 +02:00
|
|
|
extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
|
2001-03-22 05:01:46 +01:00
|
|
|
List *pathkeys,
|
2007-01-20 21:45:41 +01:00
|
|
|
bool outer_keys,
|
2001-03-22 05:01:46 +01:00
|
|
|
List *restrictinfos);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
|
2007-11-15 22:14:46 +01:00
|
|
|
List *mergeclauses,
|
|
|
|
RelOptInfo *joinrel);
|
2007-01-20 21:45:41 +01:00
|
|
|
extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
|
2007-11-15 22:14:46 +01:00
|
|
|
List *mergeclauses,
|
|
|
|
List *outer_pathkeys);
|
2005-06-06 00:32:58 +02:00
|
|
|
extern List *truncate_useless_pathkeys(PlannerInfo *root,
|
2001-03-22 05:01:46 +01:00
|
|
|
RelOptInfo *rel,
|
|
|
|
List *pathkeys);
|
2007-04-15 22:09:28 +02:00
|
|
|
extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
|
2015-12-22 19:46:40 +01:00
|
|
|
extern PathKey *make_canonical_pathkey(PlannerInfo *root,
|
|
|
|
EquivalenceClass *eclass, Oid opfamily,
|
|
|
|
int strategy, bool nulls_first);
|
2001-10-28 07:26:15 +01:00
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
#endif /* PATHS_H */
|