2005-04-12 01:06:57 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* planagg.c
|
|
|
|
* Special planning for aggregate queries.
|
|
|
|
*
|
2010-11-04 17:01:17 +01:00
|
|
|
* This module tries to replace MIN/MAX aggregate functions by subqueries
|
|
|
|
* of the form
|
2011-03-22 05:34:31 +01:00
|
|
|
* (SELECT col FROM tab
|
|
|
|
* WHERE col IS NOT NULL AND existing-quals
|
|
|
|
* ORDER BY col ASC/DESC
|
|
|
|
* LIMIT 1)
|
2010-11-04 17:01:17 +01:00
|
|
|
* Given a suitable index on tab.col, this can be much faster than the
|
2014-05-06 18:12:18 +02:00
|
|
|
* generic scan-all-the-rows aggregation plan. We can handle multiple
|
2010-11-04 17:01:17 +01:00
|
|
|
* MIN/MAX aggregates by generating multiple subqueries, and their
|
2014-05-06 18:12:18 +02:00
|
|
|
* orderings can be different. However, if the query contains any
|
2010-11-04 17:01:17 +01:00
|
|
|
* non-optimizable aggregates, there's no point since we'll have to
|
|
|
|
* scan all the rows anyway.
|
|
|
|
*
|
|
|
|
*
|
2016-01-02 19:33:40 +01:00
|
|
|
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
|
2005-04-12 01:06:57 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/optimizer/plan/planagg.c
|
2005-04-12 01:06:57 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
2005-04-12 01:06:57 +02:00
|
|
|
#include "catalog/pg_aggregate.h"
|
|
|
|
#include "catalog/pg_type.h"
|
|
|
|
#include "nodes/makefuncs.h"
|
2008-08-26 00:42:34 +02:00
|
|
|
#include "nodes/nodeFuncs.h"
|
2005-04-12 01:06:57 +02:00
|
|
|
#include "optimizer/clauses.h"
|
|
|
|
#include "optimizer/cost.h"
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
#include "optimizer/pathnode.h"
|
2005-04-12 01:06:57 +02:00
|
|
|
#include "optimizer/paths.h"
|
|
|
|
#include "optimizer/planmain.h"
|
|
|
|
#include "optimizer/subselect.h"
|
Fix generation of MergeAppend plans for optimized min/max on expressions.
Before jamming a desired targetlist into a plan node, one really ought to
make sure the plan node can handle projections, and insert a buffering
Result plan node if not. planagg.c forgot to do this, which is a hangover
from the days when it only dealt with IndexScan plan types. MergeAppend
doesn't project though, not to mention that it gets unhappy if you remove
its possibly-resjunk sort columns. The code accidentally failed to fail
for cases in which the min/max argument was a simple Var, because the new
targetlist would be equivalent to the original "flat" tlist anyway.
For any more complex case, it's been broken since 9.1 where we introduced
the ability to optimize min/max using MergeAppend, as reported by Raphael
Bauduin. Fix by duplicating the logic from grouping_planner that decides
whether we need a Result node.
In 9.2 and 9.1, this requires back-porting the tlist_same_exprs() function
introduced in commit 4387cf956b9eb13aad569634e0c4df081d76e2e3, else we'd
uselessly add a Result node in cases that worked before. It's rather
tempting to back-patch that whole commit so that we can avoid extra Result
nodes in mainline cases too; but I'll refrain, since that code hasn't
really seen all that much field testing yet.
2013-11-07 19:13:12 +01:00
|
|
|
#include "optimizer/tlist.h"
|
2006-07-11 19:26:59 +02:00
|
|
|
#include "parser/parsetree.h"
|
2011-03-22 05:34:31 +01:00
|
|
|
#include "parser/parse_clause.h"
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
#include "rewrite/rewriteManip.h"
|
2005-04-12 01:06:57 +02:00
|
|
|
#include "utils/lsyscache.h"
|
|
|
|
#include "utils/syscache.h"
|
|
|
|
|
|
|
|
|
|
|
|
static bool find_minmax_aggs_walker(Node *node, List **context);
|
2011-03-22 05:34:31 +01:00
|
|
|
static bool build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
|
|
|
|
Oid eqop, Oid sortop, bool nulls_first);
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
static void minmax_qp_callback(PlannerInfo *root, void *extra);
|
2005-04-12 01:06:57 +02:00
|
|
|
static Oid fetch_agg_sort_op(Oid aggfnoid);
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2010-11-04 17:01:17 +01:00
|
|
|
* preprocess_minmax_aggregates - preprocess MIN/MAX aggregates
|
2005-04-12 01:06:57 +02:00
|
|
|
*
|
2010-11-04 17:01:17 +01:00
|
|
|
* Check to see whether the query contains MIN/MAX aggregate functions that
|
|
|
|
* might be optimizable via indexscans. If it does, and all the aggregates
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* are potentially optimizable, then create a MinMaxAggPath and add it to
|
|
|
|
* the (UPPERREL_GROUP_AGG, NULL) upperrel.
|
|
|
|
*
|
|
|
|
* This should be called by grouping_planner() just before it's ready to call
|
|
|
|
* query_planner(), because we generate indexscan paths by cloning the
|
|
|
|
* planner's state and invoking query_planner() on a modified version of
|
|
|
|
* the query parsetree. Thus, all preprocessing needed before query_planner()
|
|
|
|
* must already be done.
|
2005-04-12 01:06:57 +02:00
|
|
|
*
|
2010-11-04 17:01:17 +01:00
|
|
|
* Note: we are passed the preprocessed targetlist separately, because it's
|
|
|
|
* not necessarily equal to root->parse->targetList.
|
2005-04-12 01:06:57 +02:00
|
|
|
*/
|
2010-11-04 17:01:17 +01:00
|
|
|
void
|
|
|
|
preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
|
2005-04-12 01:06:57 +02:00
|
|
|
{
|
2005-06-06 00:32:58 +02:00
|
|
|
Query *parse = root->parse;
|
2007-02-06 07:50:26 +01:00
|
|
|
FromExpr *jtnode;
|
2005-04-12 01:06:57 +02:00
|
|
|
RangeTblRef *rtr;
|
|
|
|
RangeTblEntry *rte;
|
|
|
|
List *aggs_list;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
RelOptInfo *grouped_rel;
|
2010-11-04 17:01:17 +01:00
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
/* minmax_aggs list should be empty at this point */
|
|
|
|
Assert(root->minmax_aggs == NIL);
|
2005-04-12 01:06:57 +02:00
|
|
|
|
|
|
|
/* Nothing to do if query has no aggregates */
|
2005-06-06 00:32:58 +02:00
|
|
|
if (!parse->hasAggs)
|
2010-11-04 17:01:17 +01:00
|
|
|
return;
|
2005-04-12 01:06:57 +02:00
|
|
|
|
2005-10-15 04:49:52 +02:00
|
|
|
Assert(!parse->setOperations); /* shouldn't get here if a setop */
|
|
|
|
Assert(parse->rowMarks == NIL); /* nor if FOR UPDATE */
|
2005-04-12 01:06:57 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Reject unoptimizable cases.
|
|
|
|
*
|
2008-12-28 19:54:01 +01:00
|
|
|
* We don't handle GROUP BY or windowing, because our current
|
2009-06-11 16:49:15 +02:00
|
|
|
* implementations of grouping require looking at all the rows anyway, and
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* so there's not much point in optimizing MIN/MAX.
|
2005-04-12 01:06:57 +02:00
|
|
|
*/
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
if (parse->groupClause || list_length(parse->groupingSets) > 1 ||
|
|
|
|
parse->hasWindowFuncs)
|
2010-11-04 17:01:17 +01:00
|
|
|
return;
|
2005-04-12 01:06:57 +02:00
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* We also restrict the query to reference exactly one table, since join
|
|
|
|
* conditions can't be handled reasonably. (We could perhaps handle a
|
|
|
|
* query containing cartesian-product joins, but it hardly seems worth the
|
Revisit handling of UNION ALL subqueries with non-Var output columns.
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
2012-03-16 18:11:12 +01:00
|
|
|
* trouble.) However, the single table could be buried in several levels
|
|
|
|
* of FromExpr due to subqueries. Note the "single" table could be an
|
|
|
|
* inheritance parent, too, including the case of a UNION ALL subquery
|
|
|
|
* that's been flattened to an appendrel.
|
2005-04-12 01:06:57 +02:00
|
|
|
*/
|
2007-02-06 07:50:26 +01:00
|
|
|
jtnode = parse->jointree;
|
|
|
|
while (IsA(jtnode, FromExpr))
|
|
|
|
{
|
|
|
|
if (list_length(jtnode->fromlist) != 1)
|
2010-11-04 17:01:17 +01:00
|
|
|
return;
|
2007-02-06 07:50:26 +01:00
|
|
|
jtnode = linitial(jtnode->fromlist);
|
|
|
|
}
|
|
|
|
if (!IsA(jtnode, RangeTblRef))
|
2010-11-04 17:01:17 +01:00
|
|
|
return;
|
2007-02-06 07:50:26 +01:00
|
|
|
rtr = (RangeTblRef *) jtnode;
|
2007-04-21 23:01:45 +02:00
|
|
|
rte = planner_rt_fetch(rtr->rtindex, root);
|
Revisit handling of UNION ALL subqueries with non-Var output columns.
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
2012-03-16 18:11:12 +01:00
|
|
|
if (rte->rtekind == RTE_RELATION)
|
2012-06-10 21:20:04 +02:00
|
|
|
/* ordinary relation, ok */ ;
|
Revisit handling of UNION ALL subqueries with non-Var output columns.
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
2012-03-16 18:11:12 +01:00
|
|
|
else if (rte->rtekind == RTE_SUBQUERY && rte->inh)
|
2012-06-10 21:20:04 +02:00
|
|
|
/* flattened UNION ALL subquery, ok */ ;
|
Revisit handling of UNION ALL subqueries with non-Var output columns.
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
2012-03-16 18:11:12 +01:00
|
|
|
else
|
2010-11-04 17:01:17 +01:00
|
|
|
return;
|
2005-04-12 01:06:57 +02:00
|
|
|
|
|
|
|
/*
|
2010-11-04 17:01:17 +01:00
|
|
|
* Scan the tlist and HAVING qual to find all the aggregates and verify
|
2014-05-06 18:12:18 +02:00
|
|
|
* all are MIN/MAX aggregates. Stop as soon as we find one that isn't.
|
2005-04-12 01:06:57 +02:00
|
|
|
*/
|
|
|
|
aggs_list = NIL;
|
|
|
|
if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
|
2010-11-04 17:01:17 +01:00
|
|
|
return;
|
2005-06-06 00:32:58 +02:00
|
|
|
if (find_minmax_aggs_walker(parse->havingQual, &aggs_list))
|
2010-11-04 17:01:17 +01:00
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* OK, there is at least the possibility of performing the optimization.
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* Build an access path for each aggregate. If any of the aggregates
|
|
|
|
* prove to be non-indexable, give up; there is no point in optimizing
|
|
|
|
* just some of them.
|
2010-11-04 17:01:17 +01:00
|
|
|
*/
|
|
|
|
foreach(lc, aggs_list)
|
|
|
|
{
|
|
|
|
MinMaxAggInfo *mminfo = (MinMaxAggInfo *) lfirst(lc);
|
2011-03-22 05:34:31 +01:00
|
|
|
Oid eqop;
|
|
|
|
bool reverse;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We'll need the equality operator that goes with the aggregate's
|
|
|
|
* ordering operator.
|
|
|
|
*/
|
|
|
|
eqop = get_equality_op_for_ordering_op(mminfo->aggsortop, &reverse);
|
2011-04-10 17:42:00 +02:00
|
|
|
if (!OidIsValid(eqop)) /* shouldn't happen */
|
2011-03-22 05:34:31 +01:00
|
|
|
elog(ERROR, "could not find equality operator for ordering operator %u",
|
|
|
|
mminfo->aggsortop);
|
2010-11-04 17:01:17 +01:00
|
|
|
|
2011-03-22 05:34:31 +01:00
|
|
|
/*
|
|
|
|
* We can use either an ordering that gives NULLS FIRST or one that
|
|
|
|
* gives NULLS LAST; furthermore there's unlikely to be much
|
|
|
|
* performance difference between them, so it doesn't seem worth
|
2014-05-06 18:12:18 +02:00
|
|
|
* costing out both ways if we get a hit on the first one. NULLS
|
2011-03-22 05:34:31 +01:00
|
|
|
* FIRST is more likely to be available if the operator is a
|
|
|
|
* reverse-sort operator, so try that first if reverse.
|
|
|
|
*/
|
|
|
|
if (build_minmax_path(root, mminfo, eqop, mminfo->aggsortop, reverse))
|
|
|
|
continue;
|
|
|
|
if (build_minmax_path(root, mminfo, eqop, mminfo->aggsortop, !reverse))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* No indexable path for this aggregate, so fail */
|
|
|
|
return;
|
2010-11-04 17:01:17 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* OK, we can do the query this way. Prepare to create a MinMaxAggPath
|
|
|
|
* node.
|
2011-03-22 05:34:31 +01:00
|
|
|
*
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* First, create an output Param node for each agg. (If we end up not
|
|
|
|
* using the MinMaxAggPath, we'll waste a PARAM_EXEC slot for each agg,
|
|
|
|
* which is not worth worrying about. We can't wait till create_plan time
|
|
|
|
* to decide whether to make the Param, unfortunately.)
|
2010-11-04 17:01:17 +01:00
|
|
|
*/
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
foreach(lc, aggs_list)
|
2005-04-12 01:06:57 +02:00
|
|
|
{
|
2010-11-04 17:01:17 +01:00
|
|
|
MinMaxAggInfo *mminfo = (MinMaxAggInfo *) lfirst(lc);
|
2005-04-12 01:06:57 +02:00
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
mminfo->param =
|
|
|
|
SS_make_initplan_output_param(root,
|
|
|
|
exprType((Node *) mminfo->target),
|
|
|
|
-1,
|
|
|
|
exprCollation((Node *) mminfo->target));
|
2005-04-12 01:06:57 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* Create a MinMaxAggPath node with the appropriate estimated costs and
|
|
|
|
* other needed data, and add it to the UPPERREL_GROUP_AGG upperrel, where
|
|
|
|
* it will compete against the standard aggregate implementation. (It
|
|
|
|
* will likely always win, but we need not assume that here.)
|
2010-11-04 17:01:17 +01:00
|
|
|
*
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* Note: grouping_planner won't have created this upperrel yet, but it's
|
2016-07-01 19:12:34 +02:00
|
|
|
* fine for us to create it first. We will not have inserted the correct
|
|
|
|
* consider_parallel value in it, but MinMaxAggPath paths are currently
|
|
|
|
* never parallel-safe anyway, so that doesn't matter. Likewise, it
|
|
|
|
* doesn't matter that we haven't filled FDW-related fields in the rel.
|
2005-04-12 01:06:57 +02:00
|
|
|
*/
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
|
|
|
|
add_path(grouped_rel, (Path *)
|
|
|
|
create_minmaxagg_path(root, grouped_rel,
|
|
|
|
create_pathtarget(root, tlist),
|
|
|
|
aggs_list,
|
|
|
|
(List *) parse->havingQual));
|
2005-04-12 01:06:57 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* find_minmax_aggs_walker
|
|
|
|
* Recursively scan the Aggref nodes in an expression tree, and check
|
|
|
|
* that each one is a MIN/MAX aggregate. If so, build a list of the
|
|
|
|
* distinct aggregate calls in the tree.
|
|
|
|
*
|
|
|
|
* Returns TRUE if a non-MIN/MAX aggregate is found, FALSE otherwise.
|
|
|
|
* (This seemingly-backward definition is used because expression_tree_walker
|
|
|
|
* aborts the scan on TRUE return, which is what we want.)
|
|
|
|
*
|
|
|
|
* Found aggregates are added to the list at *context; it's up to the caller
|
|
|
|
* to initialize the list to NIL.
|
|
|
|
*
|
|
|
|
* This does not descend into subqueries, and so should be used only after
|
|
|
|
* reduction of sublinks to subplans. There mustn't be outer-aggregate
|
|
|
|
* references either.
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
find_minmax_aggs_walker(Node *node, List **context)
|
|
|
|
{
|
|
|
|
if (node == NULL)
|
|
|
|
return false;
|
|
|
|
if (IsA(node, Aggref))
|
|
|
|
{
|
|
|
|
Aggref *aggref = (Aggref *) node;
|
|
|
|
Oid aggsortop;
|
2009-12-15 18:57:48 +01:00
|
|
|
TargetEntry *curTarget;
|
2010-11-04 17:01:17 +01:00
|
|
|
MinMaxAggInfo *mminfo;
|
2005-04-12 01:06:57 +02:00
|
|
|
ListCell *l;
|
|
|
|
|
|
|
|
Assert(aggref->agglevelsup == 0);
|
2013-07-17 02:14:37 +02:00
|
|
|
if (list_length(aggref->args) != 1)
|
2006-07-27 21:52:07 +02:00
|
|
|
return true; /* it couldn't be MIN/MAX */
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
|
2013-07-17 02:14:37 +02:00
|
|
|
/*
|
|
|
|
* ORDER BY is usually irrelevant for MIN/MAX, but it can change the
|
|
|
|
* outcome if the aggsortop's operator class recognizes non-identical
|
|
|
|
* values as equal. For example, 4.0 and 4.00 are equal according to
|
|
|
|
* numeric_ops, yet distinguishable. If MIN() receives more than one
|
|
|
|
* value equal to 4.0 and no value less than 4.0, it is unspecified
|
|
|
|
* which of those equal values MIN() returns. An ORDER BY expression
|
|
|
|
* that differs for each of those equal values of the argument
|
|
|
|
* expression makes the result predictable once again. This is a
|
|
|
|
* niche requirement, and we do not implement it with subquery paths.
|
Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 22:11:35 +01:00
|
|
|
* In any case, this test lets us reject ordered-set aggregates
|
|
|
|
* quickly.
|
2013-07-17 02:14:37 +02:00
|
|
|
*/
|
|
|
|
if (aggref->aggorder != NIL)
|
|
|
|
return true;
|
Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 22:11:35 +01:00
|
|
|
/* note: we do not care if DISTINCT is mentioned ... */
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
|
2013-07-17 02:15:36 +02:00
|
|
|
/*
|
|
|
|
* We might implement the optimization when a FILTER clause is present
|
Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 22:11:35 +01:00
|
|
|
* by adding the filter to the quals of the generated subquery. For
|
|
|
|
* now, just punt.
|
2013-07-17 02:15:36 +02:00
|
|
|
*/
|
|
|
|
if (aggref->aggfilter != NULL)
|
|
|
|
return true;
|
2005-04-12 01:06:57 +02:00
|
|
|
|
|
|
|
aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
|
|
|
|
if (!OidIsValid(aggsortop))
|
|
|
|
return true; /* not a MIN/MAX aggregate */
|
|
|
|
|
2013-07-17 02:14:37 +02:00
|
|
|
curTarget = (TargetEntry *) linitial(aggref->args);
|
|
|
|
|
2010-11-04 17:01:17 +01:00
|
|
|
if (contain_mutable_functions((Node *) curTarget->expr))
|
|
|
|
return true; /* not potentially indexable */
|
|
|
|
|
|
|
|
if (type_is_rowtype(exprType((Node *) curTarget->expr)))
|
|
|
|
return true; /* IS NOT NULL would have weird semantics */
|
|
|
|
|
2005-04-12 01:06:57 +02:00
|
|
|
/*
|
|
|
|
* Check whether it's already in the list, and add it if not.
|
|
|
|
*/
|
|
|
|
foreach(l, *context)
|
|
|
|
{
|
2010-11-04 17:01:17 +01:00
|
|
|
mminfo = (MinMaxAggInfo *) lfirst(l);
|
|
|
|
if (mminfo->aggfnoid == aggref->aggfnoid &&
|
|
|
|
equal(mminfo->target, curTarget->expr))
|
2005-04-12 01:06:57 +02:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2010-11-04 17:01:17 +01:00
|
|
|
mminfo = makeNode(MinMaxAggInfo);
|
|
|
|
mminfo->aggfnoid = aggref->aggfnoid;
|
|
|
|
mminfo->aggsortop = aggsortop;
|
|
|
|
mminfo->target = curTarget->expr;
|
2011-04-10 17:42:00 +02:00
|
|
|
mminfo->subroot = NULL; /* don't compute path yet */
|
2011-03-22 05:34:31 +01:00
|
|
|
mminfo->path = NULL;
|
|
|
|
mminfo->pathcost = 0;
|
|
|
|
mminfo->param = NULL;
|
2005-04-12 01:06:57 +02:00
|
|
|
|
2010-11-04 17:01:17 +01:00
|
|
|
*context = lappend(*context, mminfo);
|
2005-04-12 01:06:57 +02:00
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* We need not recurse into the argument, since it can't contain any
|
|
|
|
* aggregates.
|
2005-04-12 01:06:57 +02:00
|
|
|
*/
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
Assert(!IsA(node, SubLink));
|
|
|
|
return expression_tree_walker(node, find_minmax_aggs_walker,
|
|
|
|
(void *) context);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2011-03-22 05:34:31 +01:00
|
|
|
* build_minmax_path
|
|
|
|
* Given a MIN/MAX aggregate, try to build an indexscan Path it can be
|
2010-11-04 17:01:17 +01:00
|
|
|
* optimized with.
|
2005-04-12 01:06:57 +02:00
|
|
|
*
|
2011-03-22 05:34:31 +01:00
|
|
|
* If successful, stash the best path in *mminfo and return TRUE.
|
|
|
|
* Otherwise, return FALSE.
|
2005-04-12 01:06:57 +02:00
|
|
|
*/
|
2011-03-22 05:34:31 +01:00
|
|
|
static bool
|
|
|
|
build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
|
|
|
|
Oid eqop, Oid sortop, bool nulls_first)
|
2005-04-12 01:06:57 +02:00
|
|
|
{
|
2011-03-22 05:34:31 +01:00
|
|
|
PlannerInfo *subroot;
|
|
|
|
Query *parse;
|
|
|
|
TargetEntry *tle;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
List *tlist;
|
2011-03-22 05:34:31 +01:00
|
|
|
NullTest *ntest;
|
|
|
|
SortGroupClause *sortcl;
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
RelOptInfo *final_rel;
|
2011-03-22 05:34:31 +01:00
|
|
|
Path *sorted_path;
|
|
|
|
Cost path_cost;
|
2010-11-04 17:01:17 +01:00
|
|
|
double path_fraction;
|
2007-10-13 02:58:03 +02:00
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/*
|
|
|
|
* We are going to construct what is effectively a sub-SELECT query, so
|
|
|
|
* clone the current query level's state and adjust it to make it look
|
|
|
|
* like a subquery. Any outer references will now be one level higher
|
|
|
|
* than before. (This means that when we are done, there will be no Vars
|
|
|
|
* of level 1, which is why the subquery can become an initplan.)
|
2007-10-13 02:58:03 +02:00
|
|
|
*/
|
2011-03-22 05:34:31 +01:00
|
|
|
subroot = (PlannerInfo *) palloc(sizeof(PlannerInfo));
|
|
|
|
memcpy(subroot, root, sizeof(PlannerInfo));
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
subroot->query_level++;
|
|
|
|
subroot->parent_root = root;
|
2015-08-12 05:48:37 +02:00
|
|
|
/* reset subplan-related stuff */
|
|
|
|
subroot->plan_params = NIL;
|
|
|
|
subroot->outer_params = NULL;
|
|
|
|
subroot->init_plans = NIL;
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
subroot->cte_plan_ids = NIL;
|
|
|
|
|
|
|
|
subroot->parse = parse = (Query *) copyObject(root->parse);
|
|
|
|
IncrementVarSublevelsUp((Node *) parse, 1, 1);
|
|
|
|
|
|
|
|
/* append_rel_list might contain outer Vars? */
|
|
|
|
subroot->append_rel_list = (List *) copyObject(root->append_rel_list);
|
|
|
|
IncrementVarSublevelsUp((Node *) subroot->append_rel_list, 1, 1);
|
2015-12-11 21:52:16 +01:00
|
|
|
/* There shouldn't be any OJ info to translate, as yet */
|
2011-03-22 05:34:31 +01:00
|
|
|
Assert(subroot->join_info_list == NIL);
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/* and we haven't made equivalence classes, either */
|
|
|
|
Assert(subroot->eq_classes == NIL);
|
2011-03-22 05:34:31 +01:00
|
|
|
/* and we haven't created PlaceHolderInfos, either */
|
|
|
|
Assert(subroot->placeholder_list == NIL);
|
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/*----------
|
|
|
|
* Generate modified query of the form
|
|
|
|
* (SELECT col FROM tab
|
|
|
|
* WHERE col IS NOT NULL AND existing-quals
|
|
|
|
* ORDER BY col ASC/DESC
|
|
|
|
* LIMIT 1)
|
|
|
|
*----------
|
|
|
|
*/
|
2011-03-22 05:34:31 +01:00
|
|
|
/* single tlist entry that is the aggregate target */
|
|
|
|
tle = makeTargetEntry(copyObject(mminfo->target),
|
|
|
|
(AttrNumber) 1,
|
|
|
|
pstrdup("agg_target"),
|
|
|
|
false);
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
tlist = list_make1(tle);
|
|
|
|
subroot->processed_tlist = parse->targetList = tlist;
|
2011-03-22 05:34:31 +01:00
|
|
|
|
|
|
|
/* No HAVING, no DISTINCT, no aggregates anymore */
|
|
|
|
parse->havingQual = NULL;
|
|
|
|
subroot->hasHavingQual = false;
|
|
|
|
parse->distinctClause = NIL;
|
|
|
|
parse->hasDistinctOn = false;
|
|
|
|
parse->hasAggs = false;
|
|
|
|
|
|
|
|
/* Build "target IS NOT NULL" expression */
|
|
|
|
ntest = makeNode(NullTest);
|
|
|
|
ntest->nulltesttype = IS_NOT_NULL;
|
|
|
|
ntest->arg = copyObject(mminfo->target);
|
|
|
|
/* we checked it wasn't a rowtype in find_minmax_aggs_walker */
|
|
|
|
ntest->argisrow = false;
|
2015-02-22 20:40:27 +01:00
|
|
|
ntest->location = -1;
|
2011-03-22 05:34:31 +01:00
|
|
|
|
|
|
|
/* User might have had that in WHERE already */
|
|
|
|
if (!list_member((List *) parse->jointree->quals, ntest))
|
|
|
|
parse->jointree->quals = (Node *)
|
|
|
|
lcons(ntest, (List *) parse->jointree->quals);
|
|
|
|
|
|
|
|
/* Build suitable ORDER BY clause */
|
|
|
|
sortcl = makeNode(SortGroupClause);
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
sortcl->tleSortGroupRef = assignSortGroupRef(tle, tlist);
|
2011-03-22 05:34:31 +01:00
|
|
|
sortcl->eqop = eqop;
|
|
|
|
sortcl->sortop = sortop;
|
|
|
|
sortcl->nulls_first = nulls_first;
|
2011-04-10 17:42:00 +02:00
|
|
|
sortcl->hashable = false; /* no need to make this accurate */
|
2011-03-22 05:34:31 +01:00
|
|
|
parse->sortClause = list_make1(sortcl);
|
|
|
|
|
|
|
|
/* set up expressions for LIMIT 1 */
|
|
|
|
parse->limitOffset = NULL;
|
2011-03-26 01:10:42 +01:00
|
|
|
parse->limitCount = (Node *) makeConst(INT8OID, -1, InvalidOid,
|
|
|
|
sizeof(int64),
|
2011-03-22 05:34:31 +01:00
|
|
|
Int64GetDatum(1), false,
|
|
|
|
FLOAT8PASSBYVAL);
|
2007-10-13 02:58:03 +02:00
|
|
|
|
2011-03-22 05:34:31 +01:00
|
|
|
/*
|
2011-04-10 17:42:00 +02:00
|
|
|
* Generate the best paths for this query, telling query_planner that we
|
|
|
|
* have LIMIT 1.
|
2011-03-22 05:34:31 +01:00
|
|
|
*/
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
subroot->tuple_fraction = 1.0;
|
|
|
|
subroot->limit_tuples = 1.0;
|
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
final_rel = query_planner(subroot, tlist, minmax_qp_callback, NULL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Since we didn't go through subquery_planner() to handle the subquery,
|
|
|
|
* we have to do some of the same cleanup it would do, in particular cope
|
|
|
|
* with params and initplans used within this subquery. (This won't
|
|
|
|
* matter if we end up not using the subplan.)
|
|
|
|
*/
|
|
|
|
SS_identify_outer_params(subroot);
|
|
|
|
SS_charge_for_initplans(subroot, final_rel);
|
2005-04-12 01:06:57 +02:00
|
|
|
|
2011-03-22 05:34:31 +01:00
|
|
|
/*
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
* Get the best presorted path, that being the one that's cheapest for
|
|
|
|
* fetching just one row. If there's no such path, fail.
|
2011-03-22 05:34:31 +01:00
|
|
|
*/
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
if (final_rel->rows > 1.0)
|
|
|
|
path_fraction = 1.0 / final_rel->rows;
|
|
|
|
else
|
|
|
|
path_fraction = 1.0;
|
|
|
|
|
|
|
|
sorted_path =
|
|
|
|
get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
|
|
|
|
subroot->query_pathkeys,
|
|
|
|
NULL,
|
|
|
|
path_fraction);
|
2011-03-22 05:34:31 +01:00
|
|
|
if (!sorted_path)
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 21:00:57 +02:00
|
|
|
return false;
|
2005-04-12 01:06:57 +02:00
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
/*
|
|
|
|
* The path might not return exactly what we want, so fix that. (We
|
|
|
|
* assume that this won't change any conclusions about which was the
|
|
|
|
* cheapest path.)
|
|
|
|
*/
|
|
|
|
sorted_path = apply_projection_to_path(subroot, final_rel, sorted_path,
|
Refactor planning of projection steps that don't need a Result plan node.
The original upper-planner-pathification design (commit 3fc6e2d7f5b652b4)
assumed that we could always determine during Path formation whether or not
we would need a Result plan node to perform projection of a targetlist.
That turns out not to work very well, though, because createplan.c still
has some responsibilities for choosing the specific target list associated
with sorting/grouping nodes (in particular it might choose to add resjunk
columns for sorting). We might not ever refactor that --- doing so would
push more work into Path formation, which isn't attractive --- and we
certainly won't do so for 9.6. So, while create_projection_path and
apply_projection_to_path can tell for sure what will happen if the subpath
is projection-capable, they can't tell for sure when it isn't. This is at
least a latent bug in apply_projection_to_path, which might think it can
apply a target to a non-projecting node when the node will end up computing
something different.
Also, I'd tied the creation of a ProjectionPath node to whether or not a
Result is needed, but it turns out that we sometimes need a ProjectionPath
node anyway to avoid modifying a possibly-shared subpath node. Callers had
to use create_projection_path for such cases, and we added code to them
that knew about the potential omission of a Result node and attempted to
adjust the cost estimates for that. That was uncertainly correct and
definitely ugly/unmaintainable.
To fix, have create_projection_path explicitly check whether a Result
is needed and adjust its cost estimate accordingly, though it creates
a ProjectionPath in either case. apply_projection_to_path is now mostly
just an optimized version that can avoid creating an extra Path node when
the input is known to not be shared with any other live path. (There
is one case that create_projection_path doesn't handle, which is pushing
parallel-safe expressions below a Gather node. We could make it do that
by duplicating the GatherPath, but there seems no need as yet.)
create_projection_plan still has to recheck the tlist-match condition,
which means that if the matching situation does get changed by createplan.c
then we'll have made a slightly incorrect cost estimate. But there seems
no help for that in the near term, and I doubt it occurs often enough,
let alone would change planning decisions often enough, to be worth
stressing about.
I added a "dummypp" field to ProjectionPath to track whether
create_projection_path thinks a Result is needed. This is not really
necessary as-committed because create_projection_plan doesn't look at the
flag; but it seems like a good idea to remember what we thought when
forming the cost estimate, if only for debugging purposes.
In passing, get rid of the target_parallel parameter added to
apply_projection_to_path by commit 54f5c5150. I don't think that's a good
idea because it involves callers in what should be an internal decision,
and opens us up to missing optimization opportunities if callers think they
don't need to provide a valid flag, as most don't. For the moment, this
just costs us an extra has_parallel_hazard call when planning a Gather.
If that starts to look expensive, I think a better solution would be to
teach PathTarget to carry/cache knowledge of parallel-safety of its
contents.
2016-06-22 00:38:20 +02:00
|
|
|
create_pathtarget(subroot, tlist));
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
|
2011-03-22 05:34:31 +01:00
|
|
|
/*
|
|
|
|
* Determine cost to get just the first row of the presorted path.
|
|
|
|
*
|
|
|
|
* Note: cost calculation here should match
|
|
|
|
* compare_fractional_path_costs().
|
|
|
|
*/
|
|
|
|
path_cost = sorted_path->startup_cost +
|
|
|
|
path_fraction * (sorted_path->total_cost - sorted_path->startup_cost);
|
2005-04-12 01:06:57 +02:00
|
|
|
|
2011-03-22 05:34:31 +01:00
|
|
|
/* Save state for further processing */
|
|
|
|
mminfo->subroot = subroot;
|
|
|
|
mminfo->path = sorted_path;
|
|
|
|
mminfo->pathcost = path_cost;
|
2005-04-12 01:06:57 +02:00
|
|
|
|
2011-03-22 05:34:31 +01:00
|
|
|
return true;
|
2005-04-12 01:06:57 +02:00
|
|
|
}
|
|
|
|
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
/*
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 21:58:22 +01:00
|
|
|
* Compute query_pathkeys and other pathkeys during query_planner()
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 20:49:01 +02:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
minmax_qp_callback(PlannerInfo *root, void *extra)
|
|
|
|
{
|
|
|
|
root->group_pathkeys = NIL;
|
|
|
|
root->window_pathkeys = NIL;
|
|
|
|
root->distinct_pathkeys = NIL;
|
|
|
|
|
|
|
|
root->sort_pathkeys =
|
|
|
|
make_pathkeys_for_sortclauses(root,
|
|
|
|
root->parse->sortClause,
|
|
|
|
root->parse->targetList);
|
|
|
|
|
|
|
|
root->query_pathkeys = root->sort_pathkeys;
|
|
|
|
}
|
|
|
|
|
2005-04-12 01:06:57 +02:00
|
|
|
/*
|
|
|
|
* Get the OID of the sort operator, if any, associated with an aggregate.
|
|
|
|
* Returns InvalidOid if there is no such operator.
|
|
|
|
*/
|
|
|
|
static Oid
|
|
|
|
fetch_agg_sort_op(Oid aggfnoid)
|
|
|
|
{
|
|
|
|
HeapTuple aggTuple;
|
|
|
|
Form_pg_aggregate aggform;
|
|
|
|
Oid aggsortop;
|
|
|
|
|
|
|
|
/* fetch aggregate entry from pg_aggregate */
|
2010-02-14 19:42:19 +01:00
|
|
|
aggTuple = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(aggfnoid));
|
2005-04-12 01:06:57 +02:00
|
|
|
if (!HeapTupleIsValid(aggTuple))
|
|
|
|
return InvalidOid;
|
|
|
|
aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple);
|
|
|
|
aggsortop = aggform->aggsortop;
|
|
|
|
ReleaseSysCache(aggTuple);
|
|
|
|
|
|
|
|
return aggsortop;
|
|
|
|
}
|