postgresql/src/test/regress/expected/groupingsets.out

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

2154 lines
59 KiB
Plaintext
Raw Normal View History

Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
--
-- grouping sets
--
-- test data sources
create temp view gstest1(a,b,v)
as values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),
(2,3,15),
(3,3,16),(3,4,17),
(4,1,18),(4,1,19);
create temp table gstest2 (a integer, b integer, c integer, d integer,
e integer, f integer, g integer, h integer);
copy gstest2 from stdin;
create temp table gstest3 (a integer, b integer, c integer, d integer);
copy gstest3 from stdin;
alter table gstest3 add primary key (a);
create temp table gstest4(id integer, v integer,
unhashable_col bit(4), unsortable_col xid);
insert into gstest4
values (1,1,b'0000','1'), (2,2,b'0001','1'),
(3,4,b'0010','2'), (4,8,b'0011','2'),
(5,16,b'0000','2'), (6,32,b'0001','2'),
(7,64,b'0010','1'), (8,128,b'0011','1');
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
create temp table gstest_empty (a integer, b integer, v integer);
create function gstest_data(v integer, out a integer, out b integer)
returns setof record
as $f$
begin
return query select v, i from generate_series(1,3) i;
end;
$f$ language plpgsql;
-- basic functionality
set enable_hashagg = false; -- test hashing explicitly later
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
-- simple rollup with multiple plain aggregates, with and without ordering
-- (and with ordering differing from grouping)
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by rollup (a,b);
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
1 | 1 | 0 | 21 | 2 | 11
1 | 2 | 0 | 25 | 2 | 13
1 | 3 | 0 | 14 | 1 | 14
1 | | 1 | 60 | 5 | 14
2 | 3 | 0 | 15 | 1 | 15
2 | | 1 | 15 | 1 | 15
3 | 3 | 0 | 16 | 1 | 16
3 | 4 | 0 | 17 | 1 | 17
3 | | 1 | 33 | 2 | 17
4 | 1 | 0 | 37 | 2 | 19
4 | | 1 | 37 | 2 | 19
| | 3 | 145 | 10 | 19
(12 rows)
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by rollup (a,b) order by a,b;
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
1 | 1 | 0 | 21 | 2 | 11
1 | 2 | 0 | 25 | 2 | 13
1 | 3 | 0 | 14 | 1 | 14
1 | | 1 | 60 | 5 | 14
2 | 3 | 0 | 15 | 1 | 15
2 | | 1 | 15 | 1 | 15
3 | 3 | 0 | 16 | 1 | 16
3 | 4 | 0 | 17 | 1 | 17
3 | | 1 | 33 | 2 | 17
4 | 1 | 0 | 37 | 2 | 19
4 | | 1 | 37 | 2 | 19
| | 3 | 145 | 10 | 19
(12 rows)
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by rollup (a,b) order by b desc, a;
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
1 | | 1 | 60 | 5 | 14
2 | | 1 | 15 | 1 | 15
3 | | 1 | 33 | 2 | 17
4 | | 1 | 37 | 2 | 19
| | 3 | 145 | 10 | 19
3 | 4 | 0 | 17 | 1 | 17
1 | 3 | 0 | 14 | 1 | 14
2 | 3 | 0 | 15 | 1 | 15
3 | 3 | 0 | 16 | 1 | 16
1 | 2 | 0 | 25 | 2 | 13
1 | 1 | 0 | 21 | 2 | 11
4 | 1 | 0 | 37 | 2 | 19
(12 rows)
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by rollup (a,b) order by coalesce(a,0)+coalesce(b,0);
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
| | 3 | 145 | 10 | 19
1 | | 1 | 60 | 5 | 14
1 | 1 | 0 | 21 | 2 | 11
2 | | 1 | 15 | 1 | 15
3 | | 1 | 33 | 2 | 17
1 | 2 | 0 | 25 | 2 | 13
1 | 3 | 0 | 14 | 1 | 14
4 | | 1 | 37 | 2 | 19
4 | 1 | 0 | 37 | 2 | 19
2 | 3 | 0 | 15 | 1 | 15
3 | 3 | 0 | 16 | 1 | 16
3 | 4 | 0 | 17 | 1 | 17
(12 rows)
-- various types of ordered aggs
select a, b, grouping(a,b),
array_agg(v order by v),
string_agg(v::text, ':' order by v desc),
percentile_disc(0.5) within group (order by v),
rank(1,2,12) within group (order by a,b,v)
from gstest1 group by rollup (a,b) order by a,b;
a | b | grouping | array_agg | string_agg | percentile_disc | rank
---+---+----------+---------------------------------+-------------------------------+-----------------+------
1 | 1 | 0 | {10,11} | 11:10 | 10 | 3
1 | 2 | 0 | {12,13} | 13:12 | 12 | 1
1 | 3 | 0 | {14} | 14 | 14 | 1
1 | | 1 | {10,11,12,13,14} | 14:13:12:11:10 | 12 | 3
2 | 3 | 0 | {15} | 15 | 15 | 1
2 | | 1 | {15} | 15 | 15 | 1
3 | 3 | 0 | {16} | 16 | 16 | 1
3 | 4 | 0 | {17} | 17 | 17 | 1
3 | | 1 | {16,17} | 17:16 | 16 | 1
4 | 1 | 0 | {18,19} | 19:18 | 18 | 1
4 | | 1 | {18,19} | 19:18 | 18 | 1
| | 3 | {10,11,12,13,14,15,16,17,18,19} | 19:18:17:16:15:14:13:12:11:10 | 14 | 3
(12 rows)
-- test usage of grouped columns in direct args of aggs
select grouping(a), a, array_agg(b),
rank(a) within group (order by b nulls first),
rank(a) within group (order by b nulls last)
from (values (1,1),(1,4),(1,5),(3,1),(3,2)) v(a,b)
group by rollup (a) order by a;
grouping | a | array_agg | rank | rank
----------+---+-------------+------+------
0 | 1 | {1,4,5} | 1 | 1
0 | 3 | {1,2} | 3 | 3
1 | | {1,4,5,1,2} | 1 | 6
(3 rows)
-- nesting with window functions
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
from gstest2 group by rollup (a,b) order by rsum, a, b;
a | b | sum | rsum
---+---+-----+------
1 | 1 | 8 | 8
1 | 2 | 2 | 10
1 | | 10 | 20
2 | 2 | 2 | 22
2 | | 2 | 24
| | 12 | 36
(6 rows)
-- nesting with grouping sets
select sum(c) from gstest2
group by grouping sets((), grouping sets((), grouping sets(())))
order by 1 desc;
sum
-----
12
12
12
(3 rows)
select sum(c) from gstest2
group by grouping sets((), grouping sets((), grouping sets(((a, b)))))
order by 1 desc;
sum
-----
12
12
8
2
2
(5 rows)
select sum(c) from gstest2
group by grouping sets(grouping sets(rollup(c), grouping sets(cube(c))))
order by 1 desc;
sum
-----
12
12
6
6
6
6
(6 rows)
select sum(c) from gstest2
group by grouping sets(a, grouping sets(a, cube(b)))
order by 1 desc;
sum
-----
12
10
10
8
4
2
2
(7 rows)
select sum(c) from gstest2
group by grouping sets(grouping sets((a, (b))))
order by 1 desc;
sum
-----
8
2
2
(3 rows)
select sum(c) from gstest2
group by grouping sets(grouping sets((a, b)))
order by 1 desc;
sum
-----
8
2
2
(3 rows)
select sum(c) from gstest2
group by grouping sets(grouping sets(a, grouping sets(a), a))
order by 1 desc;
sum
-----
10
10
10
2
2
2
(6 rows)
select sum(c) from gstest2
group by grouping sets(grouping sets(a, grouping sets(a, grouping sets(a), ((a)), a, grouping sets(a), (a)), a))
order by 1 desc;
sum
-----
10
10
10
10
10
10
10
10
2
2
2
2
2
2
2
2
(16 rows)
select sum(c) from gstest2
group by grouping sets((a,(a,b)), grouping sets((a,(a,b)),a))
order by 1 desc;
sum
-----
10
8
8
2
2
2
2
2
(8 rows)
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
-- empty input: first is 0 rows, second 1, third 3 etc.
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
a | b | sum | count
---+---+-----+-------
(0 rows)
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
a | b | sum | count
---+---+-----+-------
| | | 0
(1 row)
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
a | b | sum | count
---+---+-----+-------
| | | 0
| | | 0
| | | 0
(3 rows)
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
sum | count
-----+-------
| 0
| 0
| 0
(3 rows)
-- empty input with joins tests some important code paths
select t1.a, t2.b, sum(t1.v), count(*) from gstest_empty t1, gstest_empty t2
group by grouping sets ((t1.a,t2.b),());
a | b | sum | count
---+---+-----+-------
| | | 0
(1 row)
-- simple joins, var resolution, GROUPING on join vars
select t1.a, t2.b, grouping(t1.a, t2.b), sum(t1.v), max(t2.a)
from gstest1 t1, gstest2 t2
group by grouping sets ((t1.a, t2.b), ());
a | b | grouping | sum | max
---+---+----------+------+-----
1 | 1 | 0 | 420 | 1
1 | 2 | 0 | 120 | 2
2 | 1 | 0 | 105 | 1
2 | 2 | 0 | 30 | 2
3 | 1 | 0 | 231 | 1
3 | 2 | 0 | 66 | 2
4 | 1 | 0 | 259 | 1
4 | 2 | 0 | 74 | 2
| | 3 | 1305 | 2
(9 rows)
select t1.a, t2.b, grouping(t1.a, t2.b), sum(t1.v), max(t2.a)
from gstest1 t1 join gstest2 t2 on (t1.a=t2.a)
group by grouping sets ((t1.a, t2.b), ());
a | b | grouping | sum | max
---+---+----------+-----+-----
1 | 1 | 0 | 420 | 1
1 | 2 | 0 | 60 | 1
2 | 2 | 0 | 15 | 2
| | 3 | 495 | 2
(4 rows)
select a, b, grouping(a, b), sum(t1.v), max(t2.c)
from gstest1 t1 join gstest2 t2 using (a,b)
group by grouping sets ((a, b), ());
a | b | grouping | sum | max
---+---+----------+-----+-----
1 | 1 | 0 | 147 | 2
1 | 2 | 0 | 25 | 2
| | 3 | 172 | 2
(3 rows)
-- check that functionally dependent cols are not nulled
select a, d, grouping(a,b,c)
from gstest3
group by grouping sets ((a,b), (a,c));
a | d | grouping
---+---+----------
1 | 1 | 1
2 | 2 | 1
1 | 1 | 2
2 | 2 | 2
(4 rows)
-- check that distinct grouping columns are kept separate
-- even if they are equal()
explain (costs off)
select g as alias1, g as alias2
from generate_series(1,3) g
group by alias1, rollup(alias2);
QUERY PLAN
------------------------------------------------
GroupAggregate
Group Key: g, g
Group Key: g
-> Sort
Sort Key: g
-> Function Scan on generate_series g
(6 rows)
select g as alias1, g as alias2
from generate_series(1,3) g
group by alias1, rollup(alias2);
alias1 | alias2
--------+--------
1 | 1
1 |
2 | 2
2 |
3 | 3
3 |
(6 rows)
-- check that pulled-up subquery outputs still go to null when appropriate
select four, x
from (select four, ten, 'foo'::text as x from tenk1) as t
group by grouping sets (four, x)
having x = 'foo';
four | x
------+-----
| foo
(1 row)
select four, x || 'x'
from (select four, ten, 'foo'::text as x from tenk1) as t
group by grouping sets (four, x)
order by four;
four | ?column?
------+----------
0 |
1 |
2 |
3 |
| foox
(5 rows)
select (x+y)*1, sum(z)
from (select 1 as x, 2 as y, 3 as z) s
group by grouping sets (x+y, x);
?column? | sum
----------+-----
3 | 3
| 3
(2 rows)
select x, not x as not_x, q2 from
(select *, q1 = 1 as x from int8_tbl i1) as t
group by grouping sets(x, q2)
order by x, q2;
x | not_x | q2
---+-------+-------------------
f | t |
| | -4567890123456789
| | 123
| | 456
| | 4567890123456789
(5 rows)
-- check qual push-down rules for a subquery with grouping sets
explain (verbose, costs off)
select * from (
select 1 as x, q1, sum(q2)
from int8_tbl i1
group by grouping sets(1, 2)
) ss
where x = 1 and q1 = 123;
QUERY PLAN
--------------------------------------------
Subquery Scan on ss
Output: ss.x, ss.q1, ss.sum
Filter: ((ss.x = 1) AND (ss.q1 = 123))
-> GroupAggregate
Output: (1), i1.q1, sum(i1.q2)
Group Key: 1
Sort Key: i1.q1
Group Key: i1.q1
-> Seq Scan on public.int8_tbl i1
Output: 1, i1.q1, i1.q2
(10 rows)
select * from (
select 1 as x, q1, sum(q2)
from int8_tbl i1
group by grouping sets(1, 2)
) ss
where x = 1 and q1 = 123;
x | q1 | sum
---+----+-----
(0 rows)
-- check handling of pulled-up SubPlan in GROUPING() argument (bug #17479)
explain (verbose, costs off)
select grouping(ss.x)
from int8_tbl i1
cross join lateral (select (select i1.q1) as x) ss
group by ss.x;
QUERY PLAN
------------------------------------------------
GroupAggregate
Output: GROUPING((SubPlan 1)), ((SubPlan 2))
Group Key: ((SubPlan 2))
-> Sort
Output: ((SubPlan 2)), i1.q1
Sort Key: ((SubPlan 2))
-> Seq Scan on public.int8_tbl i1
Output: (SubPlan 2), i1.q1
SubPlan 2
-> Result
Output: i1.q1
(11 rows)
select grouping(ss.x)
from int8_tbl i1
cross join lateral (select (select i1.q1) as x) ss
group by ss.x;
grouping
----------
0
0
(2 rows)
explain (verbose, costs off)
select (select grouping(ss.x))
from int8_tbl i1
cross join lateral (select (select i1.q1) as x) ss
group by ss.x;
QUERY PLAN
--------------------------------------------
GroupAggregate
Output: (SubPlan 2), ((SubPlan 3))
Group Key: ((SubPlan 3))
-> Sort
Output: ((SubPlan 3)), i1.q1
Sort Key: ((SubPlan 3))
-> Seq Scan on public.int8_tbl i1
Output: (SubPlan 3), i1.q1
SubPlan 3
-> Result
Output: i1.q1
SubPlan 2
-> Result
Output: GROUPING((SubPlan 1))
(14 rows)
select (select grouping(ss.x))
from int8_tbl i1
cross join lateral (select (select i1.q1) as x) ss
group by ss.x;
grouping
----------
0
0
(2 rows)
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
-- simple rescan tests
select a, b, sum(v.x)
from (values (1),(2)) v(x), gstest_data(v.x)
group by rollup (a,b);
a | b | sum
---+---+-----
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
1 | | 3
2 | 1 | 2
2 | 2 | 2
2 | 3 | 2
2 | | 6
| | 9
(9 rows)
select *
from (values (1),(2)) v(x),
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by rollup (a,b)) s;
ERROR: aggregate functions are not allowed in FROM clause of their own query level
LINE 3: lateral (select a, b, sum(v.x) from gstest_data(v.x) ...
^
-- min max optimization should still work with GROUP BY ()
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
explain (costs off)
select min(unique1) from tenk1 GROUP BY ();
QUERY PLAN
------------------------------------------------------------
Result
InitPlan 1
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
-> Limit
-> Index Only Scan using tenk1_unique1 on tenk1
Index Cond: (unique1 IS NOT NULL)
(5 rows)
-- Views with GROUPING SET queries
CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
from gstest2 group by rollup ((a,b,c),(c,d));
NOTICE: view "gstest_view" will be a temporary view
select pg_get_viewdef('gstest_view'::regclass, true);
Get rid of the "new" and "old" entries in a view's rangetable. The rule system needs "old" and/or "new" pseudo-RTEs in rule actions that are ON INSERT/UPDATE/DELETE. Historically it's put such entries into the ON SELECT rules of views as well, but those are really quite vestigial. The only thing we've used them for is to carry the view's relid forward to AcquireExecutorLocks (so that we can re-lock the view to verify it hasn't changed before re-using a plan) and to carry its relid and permissions data forward to execution-time permissions checks. What we can do instead of that is to retain these fields of the RTE_RELATION RTE for the view even after we convert it to an RTE_SUBQUERY RTE. This requires a tiny amount of extra complication in the planner and AcquireExecutorLocks, but on the other hand we can get rid of the logic that moves that data from one place to another. The principal immediate benefit of doing this, aside from a small saving in the pg_rewrite data for views, is that these pseudo-RTEs no longer trigger ruleutils.c's heuristic about qualifying variable names when the rangetable's length is more than 1. That results in quite a number of small simplifications in regression test outputs, which are all to the good IMO. Bump catversion because we need to dump a few more fields of RTE_SUBQUERY RTEs. While those will always be zeroes anyway in stored rules (because we'd never populate them until query rewrite) they are useful for debugging, and it seems like we'd better make sure to transmit such RTEs accurately in plans sent to parallel workers. I don't think the executor actually examines these fields after startup, but someday it might. This is a second attempt at committing 1b4d280ea. The difference from the first time is that now we can add some filtering rules to AdjustUpgrade.pm to allow cross-version upgrade testing to pass despite all the cosmetic changes in CREATE VIEW outputs. Amit Langote (filtering rules by me) Discussion: https://postgr.es/m/CA+HiwqEf7gPN4Hn+LoZ4tP2q_Qt7n3vw7-6fJKOf92tSEnX6Gg@mail.gmail.com Discussion: https://postgr.es/m/891521.1673657296@sss.pgh.pa.us
2023-01-18 19:23:57 +01:00
pg_get_viewdef
---------------------------------------
SELECT a, +
b, +
GROUPING(a, b) AS "grouping", +
sum(c) AS sum, +
count(*) AS count, +
max(c) AS max +
FROM gstest2 +
GROUP BY ROLLUP((a, b, c), (c, d));
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
(1 row)
-- Nested queries with 3 or more levels of nesting
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
grouping
----------
0
0
0
(3 rows)
select(select (select grouping(e,f) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
grouping
----------
0
1
3
(3 rows)
select(select (select grouping(c) from (values (1)) v2(c) GROUP BY c) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
grouping
----------
0
0
0
(3 rows)
-- Combinations of operations
select a, b, c, d from gstest2 group by rollup(a,b),grouping sets(c,d);
a | b | c | d
---+---+---+---
1 | 1 | 1 |
1 | | 1 |
| | 1 |
1 | 1 | 2 |
1 | 2 | 2 |
1 | | 2 |
2 | 2 | 2 |
2 | | 2 |
| | 2 |
1 | 1 | | 1
1 | | | 1
| | | 1
1 | 1 | | 2
1 | 2 | | 2
1 | | | 2
2 | 2 | | 2
2 | | | 2
| | | 2
(18 rows)
select a, b from (values (1,2),(2,3)) v(a,b) group by a,b, grouping sets(a);
a | b
---+---
1 | 2
2 | 3
(2 rows)
-- Tests for chained aggregates
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
1 | 1 | 0 | 21 | 2 | 11
1 | 2 | 0 | 25 | 2 | 13
1 | 3 | 0 | 14 | 1 | 14
2 | 3 | 0 | 15 | 1 | 15
3 | 3 | 0 | 16 | 1 | 16
3 | 4 | 0 | 17 | 1 | 17
4 | 1 | 0 | 37 | 2 | 19
| | 3 | 21 | 2 | 11
| | 3 | 21 | 2 | 11
| | 3 | 25 | 2 | 13
| | 3 | 25 | 2 | 13
| | 3 | 14 | 1 | 14
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
| | 3 | 14 | 1 | 14
| | 3 | 15 | 1 | 15
| | 3 | 15 | 1 | 15
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
| | 3 | 16 | 1 | 16
| | 3 | 16 | 1 | 16
| | 3 | 17 | 1 | 17
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
| | 3 | 17 | 1 | 17
| | 3 | 37 | 2 | 19
| | 3 | 37 | 2 | 19
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
(21 rows)
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP((e+1),(f+1));
grouping
----------
0
0
0
(3 rows)
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY CUBE((e+1),(f+1)) ORDER BY (e+1),(f+1);
grouping
----------
0
0
0
0
(4 rows)
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
from gstest2 group by cube (a,b) order by rsum, a, b;
a | b | sum | rsum
---+---+-----+------
1 | 1 | 8 | 8
1 | 2 | 2 | 10
1 | | 10 | 20
2 | 2 | 2 | 22
2 | | 2 | 24
| 1 | 8 | 32
| 2 | 4 | 36
| | 12 | 48
(8 rows)
select a, b, sum(c) from (values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),(2,3,15),(3,3,16),(3,4,17),(4,1,18),(4,1,19)) v(a,b,c) group by rollup (a,b);
a | b | sum
---+---+-----
1 | 1 | 21
1 | 2 | 25
1 | 3 | 14
1 | | 60
2 | 3 | 15
2 | | 15
3 | 3 | 16
3 | 4 | 17
3 | | 33
4 | 1 | 37
4 | | 37
| | 145
(12 rows)
select a, b, sum(v.x)
from (values (1),(2)) v(x), gstest_data(v.x)
group by cube (a,b) order by a,b;
a | b | sum
---+---+-----
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
1 | | 3
2 | 1 | 2
2 | 2 | 2
2 | 3 | 2
2 | | 6
| 1 | 3
| 2 | 3
| 3 | 3
| | 9
(12 rows)
-- Test reordering of grouping sets
explain (costs off)
select * from gstest1 group by grouping sets((a,b,v),(v)) order by v,b,a;
QUERY PLAN
------------------------------------------------------------------------------
GroupAggregate
Group Key: "*VALUES*".column3, "*VALUES*".column2, "*VALUES*".column1
Group Key: "*VALUES*".column3
-> Sort
Sort Key: "*VALUES*".column3, "*VALUES*".column2, "*VALUES*".column1
-> Values Scan on "*VALUES*"
(6 rows)
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
-- Agg level check. This query should error out.
select (select grouping(a,b) from gstest2) from gstest2 group by a,b;
ERROR: arguments to GROUPING must be grouping expressions of the associated query level
LINE 1: select (select grouping(a,b) from gstest2) from gstest2 grou...
^
--Nested queries
select a, b, sum(c), count(*) from gstest2 group by grouping sets (rollup(a,b),a);
a | b | sum | count
---+---+-----+-------
1 | 1 | 8 | 7
1 | 2 | 2 | 1
1 | | 10 | 8
1 | | 10 | 8
2 | 2 | 2 | 1
2 | | 2 | 1
2 | | 2 | 1
| | 12 | 9
(8 rows)
-- HAVING queries
select ten, sum(distinct four) from onek a
group by grouping sets((ten,four),(ten))
having exists (select 1 from onek b where sum(distinct a.four) = b.four);
ten | sum
-----+-----
0 | 0
0 | 2
0 | 2
1 | 1
1 | 3
2 | 0
2 | 2
2 | 2
3 | 1
3 | 3
4 | 0
4 | 2
4 | 2
5 | 1
5 | 3
6 | 0
6 | 2
6 | 2
7 | 1
7 | 3
8 | 0
8 | 2
8 | 2
9 | 1
9 | 3
(25 rows)
-- Tests around pushdown of HAVING clauses, partially testing against previous bugs
select a,count(*) from gstest2 group by rollup(a) order by a;
a | count
---+-------
1 | 8
2 | 1
| 9
(3 rows)
select a,count(*) from gstest2 group by rollup(a) having a is distinct from 1 order by a;
a | count
---+-------
2 | 1
| 9
(2 rows)
explain (costs off)
select a,count(*) from gstest2 group by rollup(a) having a is distinct from 1 order by a;
QUERY PLAN
----------------------------------
GroupAggregate
Group Key: a
Group Key: ()
Filter: (a IS DISTINCT FROM 1)
-> Sort
Sort Key: a
-> Seq Scan on gstest2
(7 rows)
select v.c, (select count(*) from gstest2 group by () having v.c)
from (values (false),(true)) v(c) order by v.c;
c | count
---+-------
f |
t | 9
(2 rows)
explain (costs off)
select v.c, (select count(*) from gstest2 group by () having v.c)
from (values (false),(true)) v(c) order by v.c;
QUERY PLAN
-----------------------------------------------------------
Sort
Sort Key: "*VALUES*".column1
-> Values Scan on "*VALUES*"
SubPlan 1
-> Aggregate
Group Key: ()
Filter: "*VALUES*".column1
-> Result
One-Time Filter: "*VALUES*".column1
-> Seq Scan on gstest2
(10 rows)
-- HAVING with GROUPING queries
select ten, grouping(ten) from onek
group by grouping sets(ten) having grouping(ten) >= 0
order by 2,1;
ten | grouping
-----+----------
0 | 0
1 | 0
2 | 0
3 | 0
4 | 0
5 | 0
6 | 0
7 | 0
8 | 0
9 | 0
(10 rows)
select ten, grouping(ten) from onek
group by grouping sets(ten, four) having grouping(ten) > 0
order by 2,1;
ten | grouping
-----+----------
| 1
| 1
| 1
| 1
(4 rows)
select ten, grouping(ten) from onek
group by rollup(ten) having grouping(ten) > 0
order by 2,1;
ten | grouping
-----+----------
| 1
(1 row)
select ten, grouping(ten) from onek
group by cube(ten) having grouping(ten) > 0
order by 2,1;
ten | grouping
-----+----------
| 1
(1 row)
select ten, grouping(ten) from onek
group by (ten) having grouping(ten) >= 0
order by 2,1;
ten | grouping
-----+----------
0 | 0
1 | 0
2 | 0
3 | 0
4 | 0
5 | 0
6 | 0
7 | 0
8 | 0
9 | 0
(10 rows)
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
-- FILTER queries
select ten, sum(distinct four) filter (where four::text ~ '123') from onek a
group by rollup(ten);
ten | sum
-----+-----
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
(11 rows)
-- More rescan tests
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by cube(four,ten)) s on true order by v.a,four,ten;
a | a | four | ten | count
---+---+------+-----+-------
1 | 1 | 0 | 0 | 50
1 | 1 | 0 | 2 | 50
1 | 1 | 0 | 4 | 50
1 | 1 | 0 | 6 | 50
1 | 1 | 0 | 8 | 50
1 | 1 | 0 | | 250
1 | 1 | 1 | 1 | 50
1 | 1 | 1 | 3 | 50
1 | 1 | 1 | 5 | 50
1 | 1 | 1 | 7 | 50
1 | 1 | 1 | 9 | 50
1 | 1 | 1 | | 250
1 | 1 | 2 | 0 | 50
1 | 1 | 2 | 2 | 50
1 | 1 | 2 | 4 | 50
1 | 1 | 2 | 6 | 50
1 | 1 | 2 | 8 | 50
1 | 1 | 2 | | 250
1 | 1 | 3 | 1 | 50
1 | 1 | 3 | 3 | 50
1 | 1 | 3 | 5 | 50
1 | 1 | 3 | 7 | 50
1 | 1 | 3 | 9 | 50
1 | 1 | 3 | | 250
1 | 1 | | 0 | 100
1 | 1 | | 1 | 100
1 | 1 | | 2 | 100
1 | 1 | | 3 | 100
1 | 1 | | 4 | 100
1 | 1 | | 5 | 100
1 | 1 | | 6 | 100
1 | 1 | | 7 | 100
1 | 1 | | 8 | 100
1 | 1 | | 9 | 100
1 | 1 | | | 1000
2 | 2 | 0 | 0 | 50
2 | 2 | 0 | 2 | 50
2 | 2 | 0 | 4 | 50
2 | 2 | 0 | 6 | 50
2 | 2 | 0 | 8 | 50
2 | 2 | 0 | | 250
2 | 2 | 1 | 1 | 50
2 | 2 | 1 | 3 | 50
2 | 2 | 1 | 5 | 50
2 | 2 | 1 | 7 | 50
2 | 2 | 1 | 9 | 50
2 | 2 | 1 | | 250
2 | 2 | 2 | 0 | 50
2 | 2 | 2 | 2 | 50
2 | 2 | 2 | 4 | 50
2 | 2 | 2 | 6 | 50
2 | 2 | 2 | 8 | 50
2 | 2 | 2 | | 250
2 | 2 | 3 | 1 | 50
2 | 2 | 3 | 3 | 50
2 | 2 | 3 | 5 | 50
2 | 2 | 3 | 7 | 50
2 | 2 | 3 | 9 | 50
2 | 2 | 3 | | 250
2 | 2 | | 0 | 100
2 | 2 | | 1 | 100
2 | 2 | | 2 | 100
2 | 2 | | 3 | 100
2 | 2 | | 4 | 100
2 | 2 | | 5 | 100
2 | 2 | | 6 | 100
2 | 2 | | 7 | 100
2 | 2 | | 8 | 100
2 | 2 | | 9 | 100
2 | 2 | | | 1000
(70 rows)
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by cube(two,four) order by two,four) s1) from (values (1),(2)) v(a);
array
------------------------------------------------------------------------------------------------------------------------------------------------------
{"(1,0,0,250)","(1,0,2,250)","(1,0,,500)","(1,1,1,250)","(1,1,3,250)","(1,1,,500)","(1,,0,250)","(1,,1,250)","(1,,2,250)","(1,,3,250)","(1,,,1000)"}
{"(2,0,0,250)","(2,0,2,250)","(2,0,,500)","(2,1,1,250)","(2,1,3,250)","(2,1,,500)","(2,,0,250)","(2,,1,250)","(2,,2,250)","(2,,3,250)","(2,,,1000)"}
(2 rows)
-- Grouping on text columns
select sum(ten) from onek group by two, rollup(four::text) order by 1;
sum
------
1000
1000
1250
1250
2000
2500
(6 rows)
select sum(ten) from onek group by rollup(four::text), two order by 1;
sum
------
1000
1000
1250
1250
2000
2500
(6 rows)
-- hashing support
set enable_hashagg = true;
-- failure cases
select count(*) from gstest4 group by rollup(unhashable_col,unsortable_col);
ERROR: could not implement GROUP BY
DETAIL: Some of the datatypes only support hashing, while others only support sorting.
select array_agg(v order by v) from gstest4 group by grouping sets ((id,unsortable_col),(id));
ERROR: could not implement GROUP BY
DETAIL: Some of the datatypes only support hashing, while others only support sorting.
-- simple cases
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
1 | | 1 | 60 | 5 | 14
2 | | 1 | 15 | 1 | 15
3 | | 1 | 33 | 2 | 17
4 | | 1 | 37 | 2 | 19
| 1 | 2 | 58 | 4 | 19
| 2 | 2 | 25 | 2 | 13
| 3 | 2 | 45 | 3 | 16
| 4 | 2 | 17 | 1 | 17
(8 rows)
explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Sort
Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), "*VALUES*".column1, "*VALUES*".column2
-> HashAggregate
Hash Key: "*VALUES*".column1
Hash Key: "*VALUES*".column2
-> Values Scan on "*VALUES*"
(6 rows)
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by cube(a,b) order by 3,1,2;
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
1 | 1 | 0 | 21 | 2 | 11
1 | 2 | 0 | 25 | 2 | 13
1 | 3 | 0 | 14 | 1 | 14
2 | 3 | 0 | 15 | 1 | 15
3 | 3 | 0 | 16 | 1 | 16
3 | 4 | 0 | 17 | 1 | 17
4 | 1 | 0 | 37 | 2 | 19
1 | | 1 | 60 | 5 | 14
2 | | 1 | 15 | 1 | 15
3 | | 1 | 33 | 2 | 17
4 | | 1 | 37 | 2 | 19
| 1 | 2 | 58 | 4 | 19
| 2 | 2 | 25 | 2 | 13
| 3 | 2 | 45 | 3 | 16
| 4 | 2 | 17 | 1 | 17
| | 3 | 145 | 10 | 19
(16 rows)
explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by cube(a,b) order by 3,1,2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Sort
Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), "*VALUES*".column1, "*VALUES*".column2
-> MixedAggregate
Hash Key: "*VALUES*".column1, "*VALUES*".column2
Hash Key: "*VALUES*".column1
Hash Key: "*VALUES*".column2
Group Key: ()
-> Values Scan on "*VALUES*"
(8 rows)
-- shouldn't try and hash
explain (costs off)
select a, b, grouping(a,b), array_agg(v order by v)
from gstest1 group by cube(a,b);
QUERY PLAN
----------------------------------------------------------
GroupAggregate
Group Key: "*VALUES*".column1, "*VALUES*".column2
Group Key: "*VALUES*".column1
Group Key: ()
Sort Key: "*VALUES*".column2
Group Key: "*VALUES*".column2
-> Sort
Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
(9 rows)
-- unsortable cases
select unsortable_col, count(*)
from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
order by unsortable_col::text;
unsortable_col | count
----------------+-------
1 | 4
1 | 4
2 | 4
2 | 4
(4 rows)
-- mixed hashable/sortable cases
select unhashable_col, unsortable_col,
grouping(unhashable_col, unsortable_col),
count(*), sum(v)
from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
order by 3, 5;
unhashable_col | unsortable_col | grouping | count | sum
----------------+----------------+----------+-------+-----
0000 | | 1 | 2 | 17
0001 | | 1 | 2 | 34
0010 | | 1 | 2 | 68
0011 | | 1 | 2 | 136
| 2 | 2 | 4 | 60
| 1 | 2 | 4 | 195
(6 rows)
explain (costs off)
select unhashable_col, unsortable_col,
grouping(unhashable_col, unsortable_col),
count(*), sum(v)
from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
order by 3,5;
QUERY PLAN
------------------------------------------------------------------
Sort
Sort Key: (GROUPING(unhashable_col, unsortable_col)), (sum(v))
-> MixedAggregate
Hash Key: unsortable_col
Group Key: unhashable_col
-> Sort
Sort Key: unhashable_col
-> Seq Scan on gstest4
(8 rows)
select unhashable_col, unsortable_col,
grouping(unhashable_col, unsortable_col),
count(*), sum(v)
from gstest4 group by grouping sets ((v,unhashable_col),(v,unsortable_col))
order by 3,5;
unhashable_col | unsortable_col | grouping | count | sum
----------------+----------------+----------+-------+-----
0000 | | 1 | 1 | 1
0001 | | 1 | 1 | 2
0010 | | 1 | 1 | 4
0011 | | 1 | 1 | 8
0000 | | 1 | 1 | 16
0001 | | 1 | 1 | 32
0010 | | 1 | 1 | 64
0011 | | 1 | 1 | 128
| 1 | 2 | 1 | 1
| 1 | 2 | 1 | 2
| 2 | 2 | 1 | 4
| 2 | 2 | 1 | 8
| 2 | 2 | 1 | 16
| 2 | 2 | 1 | 32
| 1 | 2 | 1 | 64
| 1 | 2 | 1 | 128
(16 rows)
explain (costs off)
select unhashable_col, unsortable_col,
grouping(unhashable_col, unsortable_col),
count(*), sum(v)
from gstest4 group by grouping sets ((v,unhashable_col),(v,unsortable_col))
order by 3,5;
QUERY PLAN
------------------------------------------------------------------
Sort
Sort Key: (GROUPING(unhashable_col, unsortable_col)), (sum(v))
-> MixedAggregate
Hash Key: v, unsortable_col
Group Key: v, unhashable_col
-> Sort
Sort Key: v, unhashable_col
-> Seq Scan on gstest4
(8 rows)
-- empty input: first is 0 rows, second 1, third 3 etc.
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
a | b | sum | count
---+---+-----+-------
(0 rows)
explain (costs off)
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
QUERY PLAN
--------------------------------
HashAggregate
Hash Key: a, b
Hash Key: a
-> Seq Scan on gstest_empty
(4 rows)
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
a | b | sum | count
---+---+-----+-------
| | | 0
(1 row)
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
a | b | sum | count
---+---+-----+-------
| | | 0
| | | 0
| | | 0
(3 rows)
explain (costs off)
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
QUERY PLAN
--------------------------------
MixedAggregate
Hash Key: a, b
Group Key: ()
Group Key: ()
Group Key: ()
-> Seq Scan on gstest_empty
(6 rows)
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
sum | count
-----+-------
| 0
| 0
| 0
(3 rows)
explain (costs off)
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
QUERY PLAN
--------------------------------
Aggregate
Group Key: ()
Group Key: ()
Group Key: ()
-> Seq Scan on gstest_empty
(5 rows)
-- check that functionally dependent cols are not nulled
select a, d, grouping(a,b,c)
from gstest3
group by grouping sets ((a,b), (a,c));
a | d | grouping
---+---+----------
1 | 1 | 1
2 | 2 | 1
1 | 1 | 2
2 | 2 | 2
(4 rows)
explain (costs off)
select a, d, grouping(a,b,c)
from gstest3
group by grouping sets ((a,b), (a,c));
QUERY PLAN
---------------------------
HashAggregate
Hash Key: a, b
Hash Key: a, c
-> Seq Scan on gstest3
(4 rows)
-- simple rescan tests
select a, b, sum(v.x)
from (values (1),(2)) v(x), gstest_data(v.x)
group by grouping sets (a,b)
order by 1, 2, 3;
a | b | sum
---+---+-----
1 | | 3
2 | | 6
| 1 | 3
| 2 | 3
| 3 | 3
(5 rows)
explain (costs off)
select a, b, sum(v.x)
from (values (1),(2)) v(x), gstest_data(v.x)
group by grouping sets (a,b)
order by 3, 1, 2;
QUERY PLAN
---------------------------------------------------------------------
Sort
Sort Key: (sum("*VALUES*".column1)), gstest_data.a, gstest_data.b
-> HashAggregate
Hash Key: gstest_data.a
Hash Key: gstest_data.b
-> Nested Loop
-> Values Scan on "*VALUES*"
-> Function Scan on gstest_data
(8 rows)
select *
from (values (1),(2)) v(x),
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by grouping sets (a,b)) s;
ERROR: aggregate functions are not allowed in FROM clause of their own query level
LINE 3: lateral (select a, b, sum(v.x) from gstest_data(v.x) ...
^
explain (costs off)
select *
from (values (1),(2)) v(x),
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by grouping sets (a,b)) s;
ERROR: aggregate functions are not allowed in FROM clause of their own query level
LINE 4: lateral (select a, b, sum(v.x) from gstest_data(v.x...
^
-- Tests for chained aggregates
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
a | b | grouping | sum | count | max
---+---+----------+-----+-------+-----
1 | 1 | 0 | 21 | 2 | 11
1 | 2 | 0 | 25 | 2 | 13
1 | 3 | 0 | 14 | 1 | 14
2 | 3 | 0 | 15 | 1 | 15
3 | 3 | 0 | 16 | 1 | 16
3 | 4 | 0 | 17 | 1 | 17
4 | 1 | 0 | 37 | 2 | 19
| | 3 | 21 | 2 | 11
| | 3 | 21 | 2 | 11
| | 3 | 25 | 2 | 13
| | 3 | 25 | 2 | 13
| | 3 | 14 | 1 | 14
| | 3 | 14 | 1 | 14
| | 3 | 15 | 1 | 15
| | 3 | 15 | 1 | 15
| | 3 | 16 | 1 | 16
| | 3 | 16 | 1 | 16
| | 3 | 17 | 1 | 17
| | 3 | 17 | 1 | 17
| | 3 | 37 | 2 | 19
| | 3 | 37 | 2 | 19
(21 rows)
explain (costs off)
select a, b, grouping(a,b), sum(v), count(*), max(v)
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
QUERY PLAN
-------------------------------------------------------------------------------------------
Sort
Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), (max("*VALUES*".column3))
-> HashAggregate
Hash Key: "*VALUES*".column1, "*VALUES*".column2
Hash Key: ("*VALUES*".column1 + 1), ("*VALUES*".column2 + 1)
Hash Key: ("*VALUES*".column1 + 2), ("*VALUES*".column2 + 2)
-> Values Scan on "*VALUES*"
(7 rows)
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
from gstest2 group by cube (a,b) order by rsum, a, b;
a | b | sum | rsum
---+---+-----+------
1 | 1 | 8 | 8
1 | 2 | 2 | 10
1 | | 10 | 20
2 | 2 | 2 | 22
2 | | 2 | 24
| 1 | 8 | 32
| 2 | 4 | 36
| | 12 | 48
(8 rows)
explain (costs off)
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
from gstest2 group by cube (a,b) order by rsum, a, b;
QUERY PLAN
---------------------------------------------
Sort
Sort Key: (sum((sum(c))) OVER (?)), a, b
-> WindowAgg
-> Sort
Sort Key: a, b
-> MixedAggregate
Hash Key: a, b
Hash Key: a
Hash Key: b
Group Key: ()
-> Seq Scan on gstest2
(11 rows)
select a, b, sum(v.x)
from (values (1),(2)) v(x), gstest_data(v.x)
group by cube (a,b) order by a,b;
a | b | sum
---+---+-----
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
1 | | 3
2 | 1 | 2
2 | 2 | 2
2 | 3 | 2
2 | | 6
| 1 | 3
| 2 | 3
| 3 | 3
| | 9
(12 rows)
explain (costs off)
select a, b, sum(v.x)
from (values (1),(2)) v(x), gstest_data(v.x)
group by cube (a,b) order by a,b;
QUERY PLAN
------------------------------------------------
Sort
Sort Key: gstest_data.a, gstest_data.b
-> MixedAggregate
Hash Key: gstest_data.a, gstest_data.b
Hash Key: gstest_data.a
Hash Key: gstest_data.b
Group Key: ()
-> Nested Loop
-> Values Scan on "*VALUES*"
-> Function Scan on gstest_data
(10 rows)
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
-- Verify that we correctly handle the child node returning a
-- non-minimal slot, which happens if the input is pre-sorted,
-- e.g. due to an index scan.
BEGIN;
SET LOCAL enable_hashagg = false;
EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
QUERY PLAN
---------------------------------------
Sort
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
Sort Key: a, b
-> GroupAggregate
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
Group Key: a
Group Key: ()
Sort Key: b
Group Key: b
-> Sort
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
Sort Key: a
-> Seq Scan on gstest3
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
(10 rows)
SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
a | b | count | max | max
---+---+-------+-----+-----
1 | | 1 | 1 | 1
2 | | 1 | 2 | 2
| 1 | 1 | 1 | 1
| 2 | 1 | 2 | 2
| | 2 | 2 | 2
(5 rows)
SET LOCAL enable_seqscan = false;
EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
QUERY PLAN
------------------------------------------------------
Sort
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
Sort Key: a, b
-> GroupAggregate
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
Group Key: a
Group Key: ()
Sort Key: b
Group Key: b
-> Index Scan using gstest3_pkey on gstest3
Fix slot type handling for Agg nodes performing internal sorts. Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 23:22:52 +02:00
(8 rows)
SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
a | b | count | max | max
---+---+-------+-----+-----
1 | | 1 | 1 | 1
2 | | 1 | 2 | 2
| 1 | 1 | 1 | 1
| 2 | 1 | 2 | 2
| | 2 | 2 | 2
(5 rows)
COMMIT;
-- More rescan tests
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by cube(four,ten)) s on true order by v.a,four,ten;
a | a | four | ten | count
---+---+------+-----+-------
1 | 1 | 0 | 0 | 50
1 | 1 | 0 | 2 | 50
1 | 1 | 0 | 4 | 50
1 | 1 | 0 | 6 | 50
1 | 1 | 0 | 8 | 50
1 | 1 | 0 | | 250
1 | 1 | 1 | 1 | 50
1 | 1 | 1 | 3 | 50
1 | 1 | 1 | 5 | 50
1 | 1 | 1 | 7 | 50
1 | 1 | 1 | 9 | 50
1 | 1 | 1 | | 250
1 | 1 | 2 | 0 | 50
1 | 1 | 2 | 2 | 50
1 | 1 | 2 | 4 | 50
1 | 1 | 2 | 6 | 50
1 | 1 | 2 | 8 | 50
1 | 1 | 2 | | 250
1 | 1 | 3 | 1 | 50
1 | 1 | 3 | 3 | 50
1 | 1 | 3 | 5 | 50
1 | 1 | 3 | 7 | 50
1 | 1 | 3 | 9 | 50
1 | 1 | 3 | | 250
1 | 1 | | 0 | 100
1 | 1 | | 1 | 100
1 | 1 | | 2 | 100
1 | 1 | | 3 | 100
1 | 1 | | 4 | 100
1 | 1 | | 5 | 100
1 | 1 | | 6 | 100
1 | 1 | | 7 | 100
1 | 1 | | 8 | 100
1 | 1 | | 9 | 100
1 | 1 | | | 1000
2 | 2 | 0 | 0 | 50
2 | 2 | 0 | 2 | 50
2 | 2 | 0 | 4 | 50
2 | 2 | 0 | 6 | 50
2 | 2 | 0 | 8 | 50
2 | 2 | 0 | | 250
2 | 2 | 1 | 1 | 50
2 | 2 | 1 | 3 | 50
2 | 2 | 1 | 5 | 50
2 | 2 | 1 | 7 | 50
2 | 2 | 1 | 9 | 50
2 | 2 | 1 | | 250
2 | 2 | 2 | 0 | 50
2 | 2 | 2 | 2 | 50
2 | 2 | 2 | 4 | 50
2 | 2 | 2 | 6 | 50
2 | 2 | 2 | 8 | 50
2 | 2 | 2 | | 250
2 | 2 | 3 | 1 | 50
2 | 2 | 3 | 3 | 50
2 | 2 | 3 | 5 | 50
2 | 2 | 3 | 7 | 50
2 | 2 | 3 | 9 | 50
2 | 2 | 3 | | 250
2 | 2 | | 0 | 100
2 | 2 | | 1 | 100
2 | 2 | | 2 | 100
2 | 2 | | 3 | 100
2 | 2 | | 4 | 100
2 | 2 | | 5 | 100
2 | 2 | | 6 | 100
2 | 2 | | 7 | 100
2 | 2 | | 8 | 100
2 | 2 | | 9 | 100
2 | 2 | | | 1000
(70 rows)
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by cube(two,four) order by two,four) s1) from (values (1),(2)) v(a);
array
------------------------------------------------------------------------------------------------------------------------------------------------------
{"(1,0,0,250)","(1,0,2,250)","(1,0,,500)","(1,1,1,250)","(1,1,3,250)","(1,1,,500)","(1,,0,250)","(1,,1,250)","(1,,2,250)","(1,,3,250)","(1,,,1000)"}
{"(2,0,0,250)","(2,0,2,250)","(2,0,,500)","(2,1,1,250)","(2,1,3,250)","(2,1,,500)","(2,,0,250)","(2,,1,250)","(2,,2,250)","(2,,3,250)","(2,,,1000)"}
(2 rows)
-- Rescan logic changes when there are no empty grouping sets, so test
-- that too:
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by grouping sets(four,ten)) s on true order by v.a,four,ten;
a | a | four | ten | count
---+---+------+-----+-------
1 | 1 | 0 | | 250
1 | 1 | 1 | | 250
1 | 1 | 2 | | 250
1 | 1 | 3 | | 250
1 | 1 | | 0 | 100
1 | 1 | | 1 | 100
1 | 1 | | 2 | 100
1 | 1 | | 3 | 100
1 | 1 | | 4 | 100
1 | 1 | | 5 | 100
1 | 1 | | 6 | 100
1 | 1 | | 7 | 100
1 | 1 | | 8 | 100
1 | 1 | | 9 | 100
2 | 2 | 0 | | 250
2 | 2 | 1 | | 250
2 | 2 | 2 | | 250
2 | 2 | 3 | | 250
2 | 2 | | 0 | 100
2 | 2 | | 1 | 100
2 | 2 | | 2 | 100
2 | 2 | | 3 | 100
2 | 2 | | 4 | 100
2 | 2 | | 5 | 100
2 | 2 | | 6 | 100
2 | 2 | | 7 | 100
2 | 2 | | 8 | 100
2 | 2 | | 9 | 100
(28 rows)
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by grouping sets(two,four) order by two,four) s1) from (values (1),(2)) v(a);
array
---------------------------------------------------------------------------------
{"(1,0,,500)","(1,1,,500)","(1,,0,250)","(1,,1,250)","(1,,2,250)","(1,,3,250)"}
{"(2,0,,500)","(2,1,,500)","(2,,0,250)","(2,,1,250)","(2,,2,250)","(2,,3,250)"}
(2 rows)
-- test the knapsack
set enable_indexscan = false;
set hash_mem_multiplier = 1.0;
set work_mem = '64kB';
explain (costs off)
select unique1,
count(two), count(four), count(ten),
count(hundred), count(thousand), count(twothousand),
count(*)
from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);
QUERY PLAN
-------------------------------
MixedAggregate
Hash Key: two
Hash Key: four
Hash Key: ten
Hash Key: hundred
Group Key: unique1
Sort Key: twothousand
Group Key: twothousand
Sort Key: thousand
Group Key: thousand
-> Sort
Sort Key: unique1
-> Seq Scan on tenk1
(13 rows)
explain (costs off)
select unique1,
count(two), count(four), count(ten),
count(hundred), count(thousand), count(twothousand),
count(*)
from tenk1 group by grouping sets (unique1,hundred,ten,four,two);
QUERY PLAN
-------------------------------
MixedAggregate
Hash Key: two
Hash Key: four
Hash Key: ten
Hash Key: hundred
Group Key: unique1
-> Sort
Sort Key: unique1
-> Seq Scan on tenk1
(9 rows)
set work_mem = '384kB';
explain (costs off)
select unique1,
count(two), count(four), count(ten),
count(hundred), count(thousand), count(twothousand),
count(*)
from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);
QUERY PLAN
-------------------------------
MixedAggregate
Hash Key: two
Hash Key: four
Hash Key: ten
Hash Key: hundred
Hash Key: thousand
Group Key: unique1
Sort Key: twothousand
Group Key: twothousand
-> Sort
Sort Key: unique1
-> Seq Scan on tenk1
(12 rows)
-- check collation-sensitive matching between grouping expressions
-- (similar to a check for aggregates, but there are additional code
-- paths for GROUPING, so check again here)
select v||'a', case grouping(v||'a') when 1 then 1 else 0 end, count(*)
from unnest(array[1,1], array['a','b']) u(i,v)
group by rollup(i, v||'a') order by 1,3;
?column? | case | count
----------+------+-------
aa | 0 | 1
ba | 0 | 1
| 1 | 2
| 1 | 2
(4 rows)
select v||'a', case when grouping(v||'a') = 1 then 1 else 0 end, count(*)
from unnest(array[1,1], array['a','b']) u(i,v)
group by rollup(i, v||'a') order by 1,3;
?column? | case | count
----------+------+-------
aa | 0 | 1
ba | 0 | 1
| 1 | 2
| 1 | 2
(4 rows)
-- Bug #16784
create table bug_16784(i int, j int);
analyze bug_16784;
alter table bug_16784 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname='bug_16784';
insert into bug_16784 select g/10, g from generate_series(1,40) g;
set work_mem='64kB';
set enable_sort = false;
select * from
(values (1),(2)) v(a),
lateral (select a, i, j, count(*) from
bug_16784 group by cube(i,j)) s
order by v.a, i, j;
a | a | i | j | count
---+---+---+----+-------
1 | 1 | 0 | 1 | 1
1 | 1 | 0 | 2 | 1
1 | 1 | 0 | 3 | 1
1 | 1 | 0 | 4 | 1
1 | 1 | 0 | 5 | 1
1 | 1 | 0 | 6 | 1
1 | 1 | 0 | 7 | 1
1 | 1 | 0 | 8 | 1
1 | 1 | 0 | 9 | 1
1 | 1 | 0 | | 9
1 | 1 | 1 | 10 | 1
1 | 1 | 1 | 11 | 1
1 | 1 | 1 | 12 | 1
1 | 1 | 1 | 13 | 1
1 | 1 | 1 | 14 | 1
1 | 1 | 1 | 15 | 1
1 | 1 | 1 | 16 | 1
1 | 1 | 1 | 17 | 1
1 | 1 | 1 | 18 | 1
1 | 1 | 1 | 19 | 1
1 | 1 | 1 | | 10
1 | 1 | 2 | 20 | 1
1 | 1 | 2 | 21 | 1
1 | 1 | 2 | 22 | 1
1 | 1 | 2 | 23 | 1
1 | 1 | 2 | 24 | 1
1 | 1 | 2 | 25 | 1
1 | 1 | 2 | 26 | 1
1 | 1 | 2 | 27 | 1
1 | 1 | 2 | 28 | 1
1 | 1 | 2 | 29 | 1
1 | 1 | 2 | | 10
1 | 1 | 3 | 30 | 1
1 | 1 | 3 | 31 | 1
1 | 1 | 3 | 32 | 1
1 | 1 | 3 | 33 | 1
1 | 1 | 3 | 34 | 1
1 | 1 | 3 | 35 | 1
1 | 1 | 3 | 36 | 1
1 | 1 | 3 | 37 | 1
1 | 1 | 3 | 38 | 1
1 | 1 | 3 | 39 | 1
1 | 1 | 3 | | 10
1 | 1 | 4 | 40 | 1
1 | 1 | 4 | | 1
1 | 1 | | 1 | 1
1 | 1 | | 2 | 1
1 | 1 | | 3 | 1
1 | 1 | | 4 | 1
1 | 1 | | 5 | 1
1 | 1 | | 6 | 1
1 | 1 | | 7 | 1
1 | 1 | | 8 | 1
1 | 1 | | 9 | 1
1 | 1 | | 10 | 1
1 | 1 | | 11 | 1
1 | 1 | | 12 | 1
1 | 1 | | 13 | 1
1 | 1 | | 14 | 1
1 | 1 | | 15 | 1
1 | 1 | | 16 | 1
1 | 1 | | 17 | 1
1 | 1 | | 18 | 1
1 | 1 | | 19 | 1
1 | 1 | | 20 | 1
1 | 1 | | 21 | 1
1 | 1 | | 22 | 1
1 | 1 | | 23 | 1
1 | 1 | | 24 | 1
1 | 1 | | 25 | 1
1 | 1 | | 26 | 1
1 | 1 | | 27 | 1
1 | 1 | | 28 | 1
1 | 1 | | 29 | 1
1 | 1 | | 30 | 1
1 | 1 | | 31 | 1
1 | 1 | | 32 | 1
1 | 1 | | 33 | 1
1 | 1 | | 34 | 1
1 | 1 | | 35 | 1
1 | 1 | | 36 | 1
1 | 1 | | 37 | 1
1 | 1 | | 38 | 1
1 | 1 | | 39 | 1
1 | 1 | | 40 | 1
1 | 1 | | | 40
2 | 2 | 0 | 1 | 1
2 | 2 | 0 | 2 | 1
2 | 2 | 0 | 3 | 1
2 | 2 | 0 | 4 | 1
2 | 2 | 0 | 5 | 1
2 | 2 | 0 | 6 | 1
2 | 2 | 0 | 7 | 1
2 | 2 | 0 | 8 | 1
2 | 2 | 0 | 9 | 1
2 | 2 | 0 | | 9
2 | 2 | 1 | 10 | 1
2 | 2 | 1 | 11 | 1
2 | 2 | 1 | 12 | 1
2 | 2 | 1 | 13 | 1
2 | 2 | 1 | 14 | 1
2 | 2 | 1 | 15 | 1
2 | 2 | 1 | 16 | 1
2 | 2 | 1 | 17 | 1
2 | 2 | 1 | 18 | 1
2 | 2 | 1 | 19 | 1
2 | 2 | 1 | | 10
2 | 2 | 2 | 20 | 1
2 | 2 | 2 | 21 | 1
2 | 2 | 2 | 22 | 1
2 | 2 | 2 | 23 | 1
2 | 2 | 2 | 24 | 1
2 | 2 | 2 | 25 | 1
2 | 2 | 2 | 26 | 1
2 | 2 | 2 | 27 | 1
2 | 2 | 2 | 28 | 1
2 | 2 | 2 | 29 | 1
2 | 2 | 2 | | 10
2 | 2 | 3 | 30 | 1
2 | 2 | 3 | 31 | 1
2 | 2 | 3 | 32 | 1
2 | 2 | 3 | 33 | 1
2 | 2 | 3 | 34 | 1
2 | 2 | 3 | 35 | 1
2 | 2 | 3 | 36 | 1
2 | 2 | 3 | 37 | 1
2 | 2 | 3 | 38 | 1
2 | 2 | 3 | 39 | 1
2 | 2 | 3 | | 10
2 | 2 | 4 | 40 | 1
2 | 2 | 4 | | 1
2 | 2 | | 1 | 1
2 | 2 | | 2 | 1
2 | 2 | | 3 | 1
2 | 2 | | 4 | 1
2 | 2 | | 5 | 1
2 | 2 | | 6 | 1
2 | 2 | | 7 | 1
2 | 2 | | 8 | 1
2 | 2 | | 9 | 1
2 | 2 | | 10 | 1
2 | 2 | | 11 | 1
2 | 2 | | 12 | 1
2 | 2 | | 13 | 1
2 | 2 | | 14 | 1
2 | 2 | | 15 | 1
2 | 2 | | 16 | 1
2 | 2 | | 17 | 1
2 | 2 | | 18 | 1
2 | 2 | | 19 | 1
2 | 2 | | 20 | 1
2 | 2 | | 21 | 1
2 | 2 | | 22 | 1
2 | 2 | | 23 | 1
2 | 2 | | 24 | 1
2 | 2 | | 25 | 1
2 | 2 | | 26 | 1
2 | 2 | | 27 | 1
2 | 2 | | 28 | 1
2 | 2 | | 29 | 1
2 | 2 | | 30 | 1
2 | 2 | | 31 | 1
2 | 2 | | 32 | 1
2 | 2 | | 33 | 1
2 | 2 | | 34 | 1
2 | 2 | | 35 | 1
2 | 2 | | 36 | 1
2 | 2 | | 37 | 1
2 | 2 | | 38 | 1
2 | 2 | | 39 | 1
2 | 2 | | 40 | 1
2 | 2 | | | 40
(172 rows)
--
-- Compare results between plans using sorting and plans using hash
-- aggregation. Force spilling in both cases by setting work_mem low
-- and altering the statistics.
--
create table gs_data_1 as
select g%1000 as g1000, g%100 as g100, g%10 as g10, g
from generate_series(0,1999) g;
analyze gs_data_1;
alter table gs_data_1 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname='gs_data_1';
set work_mem='64kB';
-- Produce results with sorting.
set enable_sort = true;
set enable_hashagg = false;
set jit_above_cost = 0;
explain (costs off)
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);
QUERY PLAN
------------------------------------
GroupAggregate
Group Key: g1000, g100, g10
Group Key: g1000, g100
Group Key: g1000
Group Key: ()
Sort Key: g100, g10
Group Key: g100, g10
Group Key: g100
Sort Key: g10, g1000
Group Key: g10, g1000
Group Key: g10
-> Sort
Sort Key: g1000, g100, g10
-> Seq Scan on gs_data_1
(14 rows)
create table gs_group_1 as
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);
-- Produce results with hash aggregation.
set enable_hashagg = true;
set enable_sort = false;
explain (costs off)
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);
QUERY PLAN
------------------------------
MixedAggregate
Hash Key: g1000, g100, g10
Hash Key: g1000, g100
Hash Key: g1000
Hash Key: g100, g10
Hash Key: g100
Hash Key: g10, g1000
Hash Key: g10
Group Key: ()
-> Seq Scan on gs_data_1
(10 rows)
create table gs_hash_1 as
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);
set enable_sort = true;
set work_mem to default;
set hash_mem_multiplier to default;
-- Compare results
(select * from gs_hash_1 except select * from gs_group_1)
union all
(select * from gs_group_1 except select * from gs_hash_1);
g100 | g10 | sum | count | max
------+-----+-----+-------+-----
(0 rows)
drop table gs_group_1;
drop table gs_hash_1;
-- GROUP BY DISTINCT
-- "normal" behavior...
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by all rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | 2 |
1 | | 3
1 | | 3
1 | |
1 | |
1 | |
4 | | 6
4 | | 6
4 | | 6
4 | |
4 | |
4 | |
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | 8 |
7 | | 9
7 | | 9
7 | |
7 | |
7 | |
| |
(25 rows)
-- ...which is also the default
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | 2 |
1 | | 3
1 | | 3
1 | |
1 | |
1 | |
4 | | 6
4 | | 6
4 | | 6
4 | |
4 | |
4 | |
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | 8 |
7 | | 9
7 | | 9
7 | |
7 | |
7 | |
| |
(25 rows)
-- "group by distinct" behavior...
select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by distinct rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | | 3
1 | |
4 | | 6
4 | | 6
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | | 9
7 | |
| |
(13 rows)
-- ...which is not the same as "select distinct"
select distinct a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | | 3
1 | |
4 | | 6
4 | |
7 | 8 | 9
7 | 8 |
7 | | 9
7 | |
| |
(11 rows)
-- test handling of outer GroupingFunc within subqueries
explain (costs off)
select (select grouping(v1)) from (values ((select 1))) v(v1) group by cube(v1);
QUERY PLAN
-------------------------------
MixedAggregate
Hash Key: (InitPlan 3).col1
Group Key: ()
InitPlan 1
-> Result
InitPlan 3
-> Result
-> Result
SubPlan 2
-> Result
(10 rows)
select (select grouping(v1)) from (values ((select 1))) v(v1) group by cube(v1);
grouping
----------
1
0
(2 rows)
explain (costs off)
select (select grouping(v1)) from (values ((select 1))) v(v1) group by v1;
QUERY PLAN
----------------
GroupAggregate
InitPlan 1
-> Result
InitPlan 3
-> Result
-> Result
SubPlan 2
-> Result
Remove redundant grouping and DISTINCT columns. Avoid explicitly grouping by columns that we know are redundant for sorting, for example we need group by only one of x and y in SELECT ... WHERE x = y GROUP BY x, y This comes up more often than you might think, as shown by the changes in the regression tests. It's nearly free to detect too, since we are just piggybacking on the existing logic that detects redundant pathkeys. (In some of the existing plans that change, it's visible that a sort step preceding the grouping step already didn't bother to sort by the redundant column, making the old plan a bit silly-looking.) To do this, build processed_groupClause and processed_distinctClause lists that omit any provably-redundant sort items, and consult those not the originals where relevant. This means that within the planner, one should usually consult root->processed_groupClause or root->processed_distinctClause if one wants to know which columns are to be grouped on; but to check whether grouping or distinct-ing is happening at all, check non-NIL-ness of parse->groupClause or parse->distinctClause. This is comparable to longstanding rules about handling the HAVING clause, so I don't think it'll be a huge maintenance problem. nodeAgg.c also needs minor mods, because it's now possible to generate AGG_PLAIN and AGG_SORTED Agg nodes with zero grouping columns. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/185315.1672179489@sss.pgh.pa.us
2023-01-18 18:37:57 +01:00
(8 rows)
select (select grouping(v1)) from (values ((select 1))) v(v1) group by v1;
grouping
----------
0
(1 row)
Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:40:59 +02:00
-- end