postgresql/src/test/regress/expected/select_distinct_on.out

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

126 lines
3.2 KiB
Plaintext
Raw Normal View History

2000-01-06 07:40:54 +01:00
--
-- SELECT_DISTINCT_ON
--
SELECT DISTINCT ON (string4) string4, two, ten
FROM onek
ORDER BY string4 using <, two using >, ten using <;
string4 | two | ten
---------+-----+-----
AAAAxx | 1 | 1
HHHHxx | 1 | 1
OOOOxx | 1 | 1
VVVVxx | 1 | 1
(4 rows)
-- this will fail due to conflict of ordering requirements
SELECT DISTINCT ON (string4, ten) string4, two, ten
FROM onek
ORDER BY string4 using <, two using <, ten using <;
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON (string4, ten) string4, two, ten
^
SELECT DISTINCT ON (string4, ten) string4, ten, two
FROM onek
ORDER BY string4 using <, ten using >, two using <;
string4 | ten | two
---------+-----+-----
AAAAxx | 9 | 1
AAAAxx | 8 | 0
AAAAxx | 7 | 1
AAAAxx | 6 | 0
AAAAxx | 5 | 1
AAAAxx | 4 | 0
AAAAxx | 3 | 1
AAAAxx | 2 | 0
AAAAxx | 1 | 1
AAAAxx | 0 | 0
HHHHxx | 9 | 1
HHHHxx | 8 | 0
HHHHxx | 7 | 1
HHHHxx | 6 | 0
HHHHxx | 5 | 1
HHHHxx | 4 | 0
HHHHxx | 3 | 1
HHHHxx | 2 | 0
HHHHxx | 1 | 1
HHHHxx | 0 | 0
OOOOxx | 9 | 1
OOOOxx | 8 | 0
OOOOxx | 7 | 1
OOOOxx | 6 | 0
OOOOxx | 5 | 1
OOOOxx | 4 | 0
OOOOxx | 3 | 1
OOOOxx | 2 | 0
OOOOxx | 1 | 1
OOOOxx | 0 | 0
VVVVxx | 9 | 1
VVVVxx | 8 | 0
VVVVxx | 7 | 1
VVVVxx | 6 | 0
VVVVxx | 5 | 1
VVVVxx | 4 | 0
VVVVxx | 3 | 1
VVVVxx | 2 | 0
VVVVxx | 1 | 1
VVVVxx | 0 | 0
(40 rows)
-- bug #5049: early 8.4.x chokes on volatile DISTINCT ON clauses
select distinct on (1) floor(random()) as r, f1 from int4_tbl order by 1,2;
r | f1
---+-------------
0 | -2147483647
(1 row)
Use Limit instead of Unique to implement DISTINCT, when possible When all of the query's DISTINCT pathkeys have been marked as redundant due to EquivalenceClasses existing which contain constants, we can just implement the DISTINCT operation on a query by just limiting the number of returned rows to 1 instead of performing a Unique on all of the matching (duplicate) rows. This applies in cases such as: SELECT DISTINCT col,col2 FROM tab WHERE col = 1 AND col2 = 10; If there are any matching rows, then they must all be {1,10}. There's no point in fetching all of those and running a Unique operator on them to leave only a single row. Here we effectively just find the first row and then stop. We are obviously unable to apply this optimization if either the col = 1 or col2 = 10 were missing from the WHERE clause or if there were any additional columns in the SELECT clause. Such queries are probably not all that common, but detecting when we can apply this optimization amounts to checking if the distinct_pathkeys are NULL, which is very cheap indeed. Nothing is done here to check if the query already has a LIMIT clause. If it does then the plan may end up with 2 Limits nodes. There's no harm in that and it's probably not worth the complexity to unify them into a single Limit node. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvqS0j8RUWRUSgCAXxOqnYjHUXmKwspRj4GzVfOO25ByHA@mail.gmail.com Discussion: https://postgr.es/m/MEYPR01MB7101CD5DA0A07C9DE2B74850A4239@MEYPR01MB7101.ausprd01.prod.outlook.com
2022-10-28 12:04:38 +02:00
--
-- Test the planner's ability to use a LIMIT 1 instead of a Unique node when
-- all of the distinct_pathkeys have been marked as redundant
--
-- Ensure we also get a LIMIT plan with DISTINCT ON
EXPLAIN (COSTS OFF)
SELECT DISTINCT ON (four) four,two
FROM tenk1 WHERE four = 0 ORDER BY 1;
QUERY PLAN
----------------------------------
Result
-> Limit
-> Seq Scan on tenk1
Filter: (four = 0)
(4 rows)
-- and check the result of the above query is correct
SELECT DISTINCT ON (four) four,two
FROM tenk1 WHERE four = 0 ORDER BY 1;
four | two
------+-----
0 | 0
(1 row)
-- Ensure a Sort -> Limit is used when the ORDER BY contains additional cols
EXPLAIN (COSTS OFF)
SELECT DISTINCT ON (four) four,two
FROM tenk1 WHERE four = 0 ORDER BY 1,2;
QUERY PLAN
----------------------------------
Limit
-> Sort
Sort Key: two
-> Seq Scan on tenk1
Filter: (four = 0)
(5 rows)
-- Same again but use a column that is indexed so that we get an index scan
-- then a limit
EXPLAIN (COSTS OFF)
SELECT DISTINCT ON (four) four,hundred
FROM tenk1 WHERE four = 0 ORDER BY 1,2;
QUERY PLAN
-----------------------------------------------------
Result
-> Limit
-> Index Scan using tenk1_hundred on tenk1
Filter: (four = 0)
(4 rows)