Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
--
|
|
|
|
-- grouping sets
|
|
|
|
--
|
|
|
|
|
|
|
|
-- test data sources
|
|
|
|
|
|
|
|
create temp view gstest1(a,b,v)
|
|
|
|
as values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),
|
|
|
|
(2,3,15),
|
|
|
|
(3,3,16),(3,4,17),
|
|
|
|
(4,1,18),(4,1,19);
|
|
|
|
|
|
|
|
create temp table gstest2 (a integer, b integer, c integer, d integer,
|
|
|
|
e integer, f integer, g integer, h integer);
|
|
|
|
copy gstest2 from stdin;
|
|
|
|
1 1 1 1 1 1 1 1
|
|
|
|
1 1 1 1 1 1 1 2
|
|
|
|
1 1 1 1 1 1 2 2
|
|
|
|
1 1 1 1 1 2 2 2
|
|
|
|
1 1 1 1 2 2 2 2
|
|
|
|
1 1 1 2 2 2 2 2
|
|
|
|
1 1 2 2 2 2 2 2
|
|
|
|
1 2 2 2 2 2 2 2
|
|
|
|
2 2 2 2 2 2 2 2
|
|
|
|
\.
|
|
|
|
|
|
|
|
create temp table gstest3 (a integer, b integer, c integer, d integer);
|
|
|
|
copy gstest3 from stdin;
|
|
|
|
1 1 1 1
|
|
|
|
2 2 2 2
|
|
|
|
\.
|
|
|
|
alter table gstest3 add primary key (a);
|
|
|
|
|
2017-03-27 05:20:54 +02:00
|
|
|
create temp table gstest4(id integer, v integer,
|
|
|
|
unhashable_col bit(4), unsortable_col xid);
|
|
|
|
insert into gstest4
|
|
|
|
values (1,1,b'0000','1'), (2,2,b'0001','1'),
|
|
|
|
(3,4,b'0010','2'), (4,8,b'0011','2'),
|
|
|
|
(5,16,b'0000','2'), (6,32,b'0001','2'),
|
|
|
|
(7,64,b'0010','1'), (8,128,b'0011','1');
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
create temp table gstest_empty (a integer, b integer, v integer);
|
|
|
|
|
|
|
|
create function gstest_data(v integer, out a integer, out b integer)
|
|
|
|
returns setof record
|
|
|
|
as $f$
|
|
|
|
begin
|
|
|
|
return query select v, i from generate_series(1,3) i;
|
|
|
|
end;
|
|
|
|
$f$ language plpgsql;
|
|
|
|
|
|
|
|
-- basic functionality
|
|
|
|
|
2017-03-27 05:20:54 +02:00
|
|
|
set enable_hashagg = false; -- test hashing explicitly later
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
-- simple rollup with multiple plain aggregates, with and without ordering
|
|
|
|
-- (and with ordering differing from grouping)
|
2017-03-27 05:20:54 +02:00
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by rollup (a,b);
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by rollup (a,b) order by a,b;
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by rollup (a,b) order by b desc, a;
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by rollup (a,b) order by coalesce(a,0)+coalesce(b,0);
|
|
|
|
|
|
|
|
-- various types of ordered aggs
|
|
|
|
select a, b, grouping(a,b),
|
|
|
|
array_agg(v order by v),
|
|
|
|
string_agg(v::text, ':' order by v desc),
|
|
|
|
percentile_disc(0.5) within group (order by v),
|
|
|
|
rank(1,2,12) within group (order by a,b,v)
|
|
|
|
from gstest1 group by rollup (a,b) order by a,b;
|
|
|
|
|
|
|
|
-- test usage of grouped columns in direct args of aggs
|
|
|
|
select grouping(a), a, array_agg(b),
|
|
|
|
rank(a) within group (order by b nulls first),
|
|
|
|
rank(a) within group (order by b nulls last)
|
|
|
|
from (values (1,1),(1,4),(1,5),(3,1),(3,2)) v(a,b)
|
|
|
|
group by rollup (a) order by a;
|
|
|
|
|
|
|
|
-- nesting with window functions
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
from gstest2 group by rollup (a,b) order by rsum, a, b;
|
|
|
|
|
2015-07-26 16:37:49 +02:00
|
|
|
-- nesting with grouping sets
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets((), grouping sets((), grouping sets(())))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets((), grouping sets((), grouping sets(((a, b)))))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets(grouping sets(rollup(c), grouping sets(cube(c))))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets(a, grouping sets(a, cube(b)))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets(grouping sets((a, (b))))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets(grouping sets((a, b)))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets(grouping sets(a, grouping sets(a), a))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets(grouping sets(a, grouping sets(a, grouping sets(a), ((a)), a, grouping sets(a), (a)), a))
|
|
|
|
order by 1 desc;
|
|
|
|
select sum(c) from gstest2
|
|
|
|
group by grouping sets((a,(a,b)), grouping sets((a,(a,b)),a))
|
|
|
|
order by 1 desc;
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
-- empty input: first is 0 rows, second 1, third 3 etc.
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
|
|
|
|
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
|
|
|
|
|
|
|
|
-- empty input with joins tests some important code paths
|
|
|
|
select t1.a, t2.b, sum(t1.v), count(*) from gstest_empty t1, gstest_empty t2
|
|
|
|
group by grouping sets ((t1.a,t2.b),());
|
|
|
|
|
|
|
|
-- simple joins, var resolution, GROUPING on join vars
|
|
|
|
select t1.a, t2.b, grouping(t1.a, t2.b), sum(t1.v), max(t2.a)
|
|
|
|
from gstest1 t1, gstest2 t2
|
|
|
|
group by grouping sets ((t1.a, t2.b), ());
|
|
|
|
|
|
|
|
select t1.a, t2.b, grouping(t1.a, t2.b), sum(t1.v), max(t2.a)
|
|
|
|
from gstest1 t1 join gstest2 t2 on (t1.a=t2.a)
|
|
|
|
group by grouping sets ((t1.a, t2.b), ());
|
|
|
|
|
|
|
|
select a, b, grouping(a, b), sum(t1.v), max(t2.c)
|
|
|
|
from gstest1 t1 join gstest2 t2 using (a,b)
|
|
|
|
group by grouping sets ((a, b), ());
|
|
|
|
|
|
|
|
-- check that functionally dependent cols are not nulled
|
|
|
|
select a, d, grouping(a,b,c)
|
|
|
|
from gstest3
|
|
|
|
group by grouping sets ((a,b), (a,c));
|
|
|
|
|
Make setrefs.c match by ressortgroupref even for plain Vars.
Previously, we skipped using search_indexed_tlist_for_sortgroupref()
if the tlist expression being sought in the child plan node was merely
a Var. This is purely an optimization, based on the theory that
search_indexed_tlist_for_var() is faster, and one copy of a Var should
be as good as another. However, the GROUPING SETS patch broke the
latter assumption: grouping columns containing the "same" Var can
sometimes have different outputs, as shown in the test case added here.
So do it the hard way whenever a ressortgroupref marking exists.
(If this seems like a bottleneck, we could imagine building a tlist index
data structure for ressortgroupref values, as we do for Vars. But I'll
let that idea go until there's some evidence it's worthwhile.)
Back-patch to 9.6. The problem also exists in 9.5 where GROUPING SETS
came in, but this patch is insufficient to resolve the problem in 9.5:
there is some obscure dependency on the upper-planner-pathification
work that happened in 9.6. Given that this is such a weird corner case,
and no end users have complained about it, it doesn't seem worth the work
to develop a fix for 9.5.
Patch by me, per a report from Heikki Linnakangas. (This does not fix
Heikki's original complaint, just the follow-on one.)
Discussion: https://postgr.es/m/aefc657e-edb2-64d5-6df1-a0828f6e9104@iki.fi
2017-10-26 18:17:40 +02:00
|
|
|
-- check that distinct grouping columns are kept separate
|
|
|
|
-- even if they are equal()
|
|
|
|
explain (costs off)
|
|
|
|
select g as alias1, g as alias2
|
|
|
|
from generate_series(1,3) g
|
|
|
|
group by alias1, rollup(alias2);
|
|
|
|
|
|
|
|
select g as alias1, g as alias2
|
|
|
|
from generate_series(1,3) g
|
|
|
|
group by alias1, rollup(alias2);
|
|
|
|
|
2018-01-12 18:24:50 +01:00
|
|
|
-- check that pulled-up subquery outputs still go to null when appropriate
|
|
|
|
select four, x
|
|
|
|
from (select four, ten, 'foo'::text as x from tenk1) as t
|
|
|
|
group by grouping sets (four, x)
|
|
|
|
having x = 'foo';
|
|
|
|
|
|
|
|
select four, x || 'x'
|
|
|
|
from (select four, ten, 'foo'::text as x from tenk1) as t
|
|
|
|
group by grouping sets (four, x)
|
|
|
|
order by four;
|
|
|
|
|
|
|
|
select (x+y)*1, sum(z)
|
|
|
|
from (select 1 as x, 2 as y, 3 as z) s
|
|
|
|
group by grouping sets (x+y, x);
|
|
|
|
|
|
|
|
select x, not x as not_x, q2 from
|
|
|
|
(select *, q1 = 1 as x from int8_tbl i1) as t
|
|
|
|
group by grouping sets(x, q2)
|
|
|
|
order by x, q2;
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
-- simple rescan tests
|
|
|
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
group by rollup (a,b);
|
|
|
|
|
|
|
|
select *
|
|
|
|
from (values (1),(2)) v(x),
|
|
|
|
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by rollup (a,b)) s;
|
|
|
|
|
2017-03-14 16:38:30 +01:00
|
|
|
-- min max optimization should still work with GROUP BY ()
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
explain (costs off)
|
|
|
|
select min(unique1) from tenk1 GROUP BY ();
|
|
|
|
|
|
|
|
-- Views with GROUPING SET queries
|
|
|
|
CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
|
|
|
|
from gstest2 group by rollup ((a,b,c),(c,d));
|
|
|
|
|
|
|
|
select pg_get_viewdef('gstest_view'::regclass, true);
|
|
|
|
|
|
|
|
-- Nested queries with 3 or more levels of nesting
|
|
|
|
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
|
|
|
|
select(select (select grouping(e,f) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
|
|
|
|
select(select (select grouping(c) from (values (1)) v2(c) GROUP BY c) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
|
|
|
|
|
|
|
|
-- Combinations of operations
|
|
|
|
select a, b, c, d from gstest2 group by rollup(a,b),grouping sets(c,d);
|
|
|
|
select a, b from (values (1,2),(2,3)) v(a,b) group by a,b, grouping sets(a);
|
|
|
|
|
|
|
|
-- Tests for chained aggregates
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
2017-03-27 05:20:54 +02:00
|
|
|
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP((e+1),(f+1));
|
|
|
|
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY CUBE((e+1),(f+1)) ORDER BY (e+1),(f+1);
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
from gstest2 group by cube (a,b) order by rsum, a, b;
|
|
|
|
select a, b, sum(c) from (values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),(2,3,15),(3,3,16),(3,4,17),(4,1,18),(4,1,19)) v(a,b,c) group by rollup (a,b);
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
group by cube (a,b) order by a,b;
|
|
|
|
|
2019-07-01 00:49:13 +02:00
|
|
|
-- Test reordering of grouping sets
|
|
|
|
explain (costs off)
|
|
|
|
select * from gstest1 group by grouping sets((a,b,v),(v)) order by v,b,a;
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
|
|
|
|
-- Agg level check. This query should error out.
|
|
|
|
select (select grouping(a,b) from gstest2) from gstest2 group by a,b;
|
|
|
|
|
|
|
|
--Nested queries
|
|
|
|
select a, b, sum(c), count(*) from gstest2 group by grouping sets (rollup(a,b),a);
|
|
|
|
|
|
|
|
-- HAVING queries
|
|
|
|
select ten, sum(distinct four) from onek a
|
|
|
|
group by grouping sets((ten,four),(ten))
|
|
|
|
having exists (select 1 from onek b where sum(distinct a.four) = b.four);
|
|
|
|
|
2016-02-08 11:03:31 +01:00
|
|
|
-- Tests around pushdown of HAVING clauses, partially testing against previous bugs
|
|
|
|
select a,count(*) from gstest2 group by rollup(a) order by a;
|
|
|
|
select a,count(*) from gstest2 group by rollup(a) having a is distinct from 1 order by a;
|
|
|
|
explain (costs off)
|
|
|
|
select a,count(*) from gstest2 group by rollup(a) having a is distinct from 1 order by a;
|
|
|
|
|
|
|
|
select v.c, (select count(*) from gstest2 group by () having v.c)
|
|
|
|
from (values (false),(true)) v(c) order by v.c;
|
|
|
|
explain (costs off)
|
|
|
|
select v.c, (select count(*) from gstest2 group by () having v.c)
|
|
|
|
from (values (false),(true)) v(c) order by v.c;
|
|
|
|
|
2015-07-26 15:34:29 +02:00
|
|
|
-- HAVING with GROUPING queries
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
group by grouping sets(ten) having grouping(ten) >= 0
|
|
|
|
order by 2,1;
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
group by grouping sets(ten, four) having grouping(ten) > 0
|
|
|
|
order by 2,1;
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
group by rollup(ten) having grouping(ten) > 0
|
|
|
|
order by 2,1;
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
group by cube(ten) having grouping(ten) > 0
|
|
|
|
order by 2,1;
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
group by (ten) having grouping(ten) >= 0
|
|
|
|
order by 2,1;
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
-- FILTER queries
|
|
|
|
select ten, sum(distinct four) filter (where four::text ~ '123') from onek a
|
|
|
|
group by rollup(ten);
|
|
|
|
|
|
|
|
-- More rescan tests
|
|
|
|
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by cube(four,ten)) s on true order by v.a,four,ten;
|
|
|
|
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by cube(two,four) order by two,four) s1) from (values (1),(2)) v(a);
|
|
|
|
|
2015-07-26 15:17:44 +02:00
|
|
|
-- Grouping on text columns
|
|
|
|
select sum(ten) from onek group by two, rollup(four::text) order by 1;
|
|
|
|
select sum(ten) from onek group by rollup(four::text), two order by 1;
|
|
|
|
|
2017-03-27 05:20:54 +02:00
|
|
|
-- hashing support
|
|
|
|
|
|
|
|
set enable_hashagg = true;
|
|
|
|
|
|
|
|
-- failure cases
|
|
|
|
|
|
|
|
select count(*) from gstest4 group by rollup(unhashable_col,unsortable_col);
|
|
|
|
select array_agg(v order by v) from gstest4 group by grouping sets ((id,unsortable_col),(id));
|
|
|
|
|
|
|
|
-- simple cases
|
|
|
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
|
|
|
|
explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
|
|
|
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by cube(a,b) order by 3,1,2;
|
|
|
|
explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by cube(a,b) order by 3,1,2;
|
|
|
|
|
|
|
|
-- shouldn't try and hash
|
|
|
|
explain (costs off)
|
|
|
|
select a, b, grouping(a,b), array_agg(v order by v)
|
|
|
|
from gstest1 group by cube(a,b);
|
|
|
|
|
2018-03-21 11:42:04 +01:00
|
|
|
-- unsortable cases
|
|
|
|
select unsortable_col, count(*)
|
|
|
|
from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
|
|
|
|
order by unsortable_col::text;
|
|
|
|
|
2017-03-27 05:20:54 +02:00
|
|
|
-- mixed hashable/sortable cases
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
count(*), sum(v)
|
|
|
|
from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
|
|
|
|
order by 3, 5;
|
|
|
|
explain (costs off)
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
count(*), sum(v)
|
|
|
|
from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
|
|
|
|
order by 3,5;
|
|
|
|
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
count(*), sum(v)
|
|
|
|
from gstest4 group by grouping sets ((v,unhashable_col),(v,unsortable_col))
|
|
|
|
order by 3,5;
|
|
|
|
explain (costs off)
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
count(*), sum(v)
|
|
|
|
from gstest4 group by grouping sets ((v,unhashable_col),(v,unsortable_col))
|
|
|
|
order by 3,5;
|
|
|
|
|
|
|
|
-- empty input: first is 0 rows, second 1, third 3 etc.
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
|
|
|
|
explain (costs off)
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
|
|
|
|
explain (costs off)
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
|
|
|
|
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
|
|
|
|
explain (costs off)
|
|
|
|
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
|
|
|
|
|
|
|
|
-- check that functionally dependent cols are not nulled
|
|
|
|
select a, d, grouping(a,b,c)
|
|
|
|
from gstest3
|
|
|
|
group by grouping sets ((a,b), (a,c));
|
|
|
|
explain (costs off)
|
|
|
|
select a, d, grouping(a,b,c)
|
|
|
|
from gstest3
|
|
|
|
group by grouping sets ((a,b), (a,c));
|
|
|
|
|
|
|
|
-- simple rescan tests
|
|
|
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
2018-01-29 20:02:09 +01:00
|
|
|
group by grouping sets (a,b)
|
|
|
|
order by 1, 2, 3;
|
2017-03-27 05:20:54 +02:00
|
|
|
explain (costs off)
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
2018-01-29 20:02:09 +01:00
|
|
|
group by grouping sets (a,b)
|
|
|
|
order by 3, 1, 2;
|
2017-03-27 05:20:54 +02:00
|
|
|
select *
|
|
|
|
from (values (1),(2)) v(x),
|
|
|
|
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by grouping sets (a,b)) s;
|
|
|
|
explain (costs off)
|
|
|
|
select *
|
|
|
|
from (values (1),(2)) v(x),
|
|
|
|
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by grouping sets (a,b)) s;
|
|
|
|
|
|
|
|
-- Tests for chained aggregates
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
|
|
|
|
explain (costs off)
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
from gstest2 group by cube (a,b) order by rsum, a, b;
|
|
|
|
explain (costs off)
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
from gstest2 group by cube (a,b) order by rsum, a, b;
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
group by cube (a,b) order by a,b;
|
|
|
|
explain (costs off)
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
group by cube (a,b) order by a,b;
|
|
|
|
|
Fix slot type handling for Agg nodes performing internal sorts.
Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).
This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".
It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase. But that's a
separate change from this bugfix.
This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.
Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
|
|
|
-- Verify that we correctly handle the child node returning a
|
|
|
|
-- non-minimal slot, which happens if the input is pre-sorted,
|
|
|
|
-- e.g. due to an index scan.
|
|
|
|
BEGIN;
|
|
|
|
SET LOCAL enable_hashagg = false;
|
2019-07-25 23:52:36 +02:00
|
|
|
EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
Fix slot type handling for Agg nodes performing internal sorts.
Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).
This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".
It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase. But that's a
separate change from this bugfix.
This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.
Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
|
|
|
SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
|
|
|
SET LOCAL enable_seqscan = false;
|
2019-07-25 23:52:36 +02:00
|
|
|
EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
Fix slot type handling for Agg nodes performing internal sorts.
Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).
This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".
It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase. But that's a
separate change from this bugfix.
This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.
Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
|
|
|
SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
|
|
|
COMMIT;
|
|
|
|
|
2017-03-27 05:20:54 +02:00
|
|
|
-- More rescan tests
|
|
|
|
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by cube(four,ten)) s on true order by v.a,four,ten;
|
|
|
|
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by cube(two,four) order by two,four) s1) from (values (1),(2)) v(a);
|
|
|
|
|
|
|
|
-- Rescan logic changes when there are no empty grouping sets, so test
|
|
|
|
-- that too:
|
|
|
|
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by grouping sets(four,ten)) s on true order by v.a,four,ten;
|
|
|
|
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by grouping sets(two,four) order by two,four) s1) from (values (1),(2)) v(a);
|
|
|
|
|
|
|
|
-- test the knapsack
|
|
|
|
|
2017-03-27 06:53:26 +02:00
|
|
|
set enable_indexscan = false;
|
2017-03-27 05:20:54 +02:00
|
|
|
set work_mem = '64kB';
|
|
|
|
explain (costs off)
|
|
|
|
select unique1,
|
|
|
|
count(two), count(four), count(ten),
|
|
|
|
count(hundred), count(thousand), count(twothousand),
|
|
|
|
count(*)
|
|
|
|
from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);
|
|
|
|
explain (costs off)
|
|
|
|
select unique1,
|
|
|
|
count(two), count(four), count(ten),
|
|
|
|
count(hundred), count(thousand), count(twothousand),
|
|
|
|
count(*)
|
|
|
|
from tenk1 group by grouping sets (unique1,hundred,ten,four,two);
|
|
|
|
|
|
|
|
set work_mem = '384kB';
|
|
|
|
explain (costs off)
|
|
|
|
select unique1,
|
|
|
|
count(two), count(four), count(ten),
|
|
|
|
count(hundred), count(thousand), count(twothousand),
|
|
|
|
count(*)
|
|
|
|
from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);
|
|
|
|
|
2019-01-17 06:33:01 +01:00
|
|
|
-- check collation-sensitive matching between grouping expressions
|
|
|
|
-- (similar to a check for aggregates, but there are additional code
|
|
|
|
-- paths for GROUPING, so check again here)
|
|
|
|
|
|
|
|
select v||'a', case grouping(v||'a') when 1 then 1 else 0 end, count(*)
|
|
|
|
from unnest(array[1,1], array['a','b']) u(i,v)
|
|
|
|
group by rollup(i, v||'a') order by 1,3;
|
|
|
|
select v||'a', case when grouping(v||'a') = 1 then 1 else 0 end, count(*)
|
|
|
|
from unnest(array[1,1], array['a','b']) u(i,v)
|
|
|
|
group by rollup(i, v||'a') order by 1,3;
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
|
|
|
-- end
|