postgresql/src/test/regress/expected/select_distinct_on.out

--
-- SELECT_DISTINCT_ON
--
SELECT DISTINCT ON (string4) string4, two, ten
   FROM onek
   ORDER BY string4 using <, two using >, ten using <;
 string4 | two | ten 
---------+-----+-----
 AAAAxx  |   1 |   1
 HHHHxx  |   1 |   1
 OOOOxx  |   1 |   1
 VVVVxx  |   1 |   1
(4 rows)

-- this will fail due to conflict of ordering requirements
SELECT DISTINCT ON (string4, ten) string4, two, ten
   FROM onek
   ORDER BY string4 using <, two using <, ten using <;
ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON (string4, ten) string4, two, ten
                                     ^
SELECT DISTINCT ON (string4, ten) string4, ten, two
   FROM onek
   ORDER BY string4 using <, ten using >, two using <;
 string4 | ten | two 
---------+-----+-----
 AAAAxx  |   9 |   1
 AAAAxx  |   8 |   0
 AAAAxx  |   7 |   1
 AAAAxx  |   6 |   0
 AAAAxx  |   5 |   1
 AAAAxx  |   4 |   0
 AAAAxx  |   3 |   1
 AAAAxx  |   2 |   0
 AAAAxx  |   1 |   1
 AAAAxx  |   0 |   0
 HHHHxx  |   9 |   1
 HHHHxx  |   8 |   0
 HHHHxx  |   7 |   1
 HHHHxx  |   6 |   0
 HHHHxx  |   5 |   1
 HHHHxx  |   4 |   0
 HHHHxx  |   3 |   1
 HHHHxx  |   2 |   0
 HHHHxx  |   1 |   1
 HHHHxx  |   0 |   0
 OOOOxx  |   9 |   1
 OOOOxx  |   8 |   0
 OOOOxx  |   7 |   1
 OOOOxx  |   6 |   0
 OOOOxx  |   5 |   1
 OOOOxx  |   4 |   0
 OOOOxx  |   3 |   1
 OOOOxx  |   2 |   0
 OOOOxx  |   1 |   1
 OOOOxx  |   0 |   0
 VVVVxx  |   9 |   1
 VVVVxx  |   8 |   0
 VVVVxx  |   7 |   1
 VVVVxx  |   6 |   0
 VVVVxx  |   5 |   1
 VVVVxx  |   4 |   0
 VVVVxx  |   3 |   1
 VVVVxx  |   2 |   0
 VVVVxx  |   1 |   1
 VVVVxx  |   0 |   0
(40 rows)

-- bug #5049: early 8.4.x chokes on volatile DISTINCT ON clauses
select distinct on (1) floor(random()) as r, f1 from int4_tbl order by 1,2;
 r |     f1      
---+-------------
 0 | -2147483647
(1 row)

--
-- Test the planner's ability to use a LIMIT 1 instead of a Unique node when
-- all of the distinct_pathkeys have been marked as redundant
--
-- Ensure we also get a LIMIT plan with DISTINCT ON
EXPLAIN (COSTS OFF)
SELECT DISTINCT ON (four) four,two
   FROM tenk1 WHERE four = 0 ORDER BY 1;
            QUERY PLAN            
----------------------------------
 Result
   ->  Limit
         ->  Seq Scan on tenk1
               Filter: (four = 0)
(4 rows)

-- and check the result of the above query is correct
SELECT DISTINCT ON (four) four,two
   FROM tenk1 WHERE four = 0 ORDER BY 1;
 four | two 
------+-----
    0 |   0
(1 row)

-- Ensure a Sort -> Limit is used when the ORDER BY contains additional cols
EXPLAIN (COSTS OFF)
SELECT DISTINCT ON (four) four,two
   FROM tenk1 WHERE four = 0 ORDER BY 1,2;
            QUERY PLAN            
----------------------------------
 Limit
   ->  Sort
         Sort Key: two
         ->  Seq Scan on tenk1
               Filter: (four = 0)
(5 rows)

-- Same again but use a column that is indexed so that we get an index scan
-- then a limit
EXPLAIN (COSTS OFF)
SELECT DISTINCT ON (four) four,hundred
   FROM tenk1 WHERE four = 0 ORDER BY 1,2;
                     QUERY PLAN                      
-----------------------------------------------------
 Result
   ->  Limit
         ->  Index Scan using tenk1_hundred on tenk1
               Filter: (four = 0)
(4 rows)
Update for new psql formatting. 2000-01-06 07:40:54 +01:00			`--`
			`-- SELECT_DISTINCT_ON`
			`--`
Redesign DISTINCT ON as discussed in pgsql-sql 1/25/00: syntax is now SELECT DISTINCT ON (expr [, expr ...]) targetlist ... and there is a check to make sure that the user didn't specify an ORDER BY that's incompatible with the DISTINCT operation. Reimplement nodeUnique and nodeGroup to use the proper datatype-specific equality function for each column being compared --- they used to do bitwise comparisons or convert the data to text strings and strcmp(). (To add insult to injury, they'd look up the conversion functions once for each tuple...) Parse/plan representation of DISTINCT is now a list of SortClause nodes. initdb forced by querytree change... 2000-01-27 19:11:50 +01:00			`SELECT DISTINCT ON (string4) string4, two, ten`
Rearrange core regression tests to reduce cross-script dependencies. The idea behind this patch is to make it possible to run individual test scripts without running the entire core test suite. Making all the scripts completely independent would involve a massive rewrite, and would probably be worse for coverage of things like concurrent DDL. So this patch just does what seems practical with limited changes. The net effect is that any test script can be run after running limited earlier dependencies: * all scripts depend on test_setup * many scripts depend on create_index * other dependencies are few in number, and are documented in the parallel_schedule file. To accomplish this, I chose a small number of commonly-used tables and moved their creation and filling into test_setup. Later scripts are expected not to modify these tables' data contents, for fear of affecting other scripts' results. Also, our former habit of declaring all C functions in one place is now gone in favor of declaring them where they're used, if that's just one script, or in test_setup if necessary. There's more that could be done to remove some of the remaining inter-script dependencies, but significantly more-invasive changes would be needed, and at least for now it doesn't seem worth it. Discussion: https://postgr.es/m/1114748.1640383217@sss.pgh.pa.us 2022-02-08 21:30:38 +01:00			`FROM onek`
Redesign DISTINCT ON as discussed in pgsql-sql 1/25/00: syntax is now SELECT DISTINCT ON (expr [, expr ...]) targetlist ... and there is a check to make sure that the user didn't specify an ORDER BY that's incompatible with the DISTINCT operation. Reimplement nodeUnique and nodeGroup to use the proper datatype-specific equality function for each column being compared --- they used to do bitwise comparisons or convert the data to text strings and strcmp(). (To add insult to injury, they'd look up the conversion functions once for each tuple...) Parse/plan representation of DISTINCT is now a list of SortClause nodes. initdb forced by querytree change... 2000-01-27 19:11:50 +01:00			`ORDER BY string4 using <, two using >, ten using <;`
			`string4 \| two \| ten`
			`---------+-----+-----`
			`AAAAxx \| 1 \| 1`
			`HHHHxx \| 1 \| 1`
			`OOOOxx \| 1 \| 1`
			`VVVVxx \| 1 \| 1`
			`(4 rows)`

			`-- this will fail due to conflict of ordering requirements`
			`SELECT DISTINCT ON (string4, ten) string4, two, ten`
Rearrange core regression tests to reduce cross-script dependencies. The idea behind this patch is to make it possible to run individual test scripts without running the entire core test suite. Making all the scripts completely independent would involve a massive rewrite, and would probably be worse for coverage of things like concurrent DDL. So this patch just does what seems practical with limited changes. The net effect is that any test script can be run after running limited earlier dependencies: * all scripts depend on test_setup * many scripts depend on create_index * other dependencies are few in number, and are documented in the parallel_schedule file. To accomplish this, I chose a small number of commonly-used tables and moved their creation and filling into test_setup. Later scripts are expected not to modify these tables' data contents, for fear of affecting other scripts' results. Also, our former habit of declaring all C functions in one place is now gone in favor of declaring them where they're used, if that's just one script, or in test_setup if necessary. There's more that could be done to remove some of the remaining inter-script dependencies, but significantly more-invasive changes would be needed, and at least for now it doesn't seem worth it. Discussion: https://postgr.es/m/1114748.1640383217@sss.pgh.pa.us 2022-02-08 21:30:38 +01:00			`FROM onek`
Redesign DISTINCT ON as discussed in pgsql-sql 1/25/00: syntax is now SELECT DISTINCT ON (expr [, expr ...]) targetlist ... and there is a check to make sure that the user didn't specify an ORDER BY that's incompatible with the DISTINCT operation. Reimplement nodeUnique and nodeGroup to use the proper datatype-specific equality function for each column being compared --- they used to do bitwise comparisons or convert the data to text strings and strcmp(). (To add insult to injury, they'd look up the conversion functions once for each tuple...) Parse/plan representation of DISTINCT is now a list of SortClause nodes. initdb forced by querytree change... 2000-01-27 19:11:50 +01:00			`ORDER BY string4 using <, two using <, ten using <;`
			`ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions`
Add a bunch of new error location reports to parse-analysis error messages. There are still some weak spots around JOIN USING and relation alias lists, but most errors reported within backend/parser/ now have locations. 2008-09-01 22:42:46 +02:00			`LINE 1: SELECT DISTINCT ON (string4, ten) string4, two, ten`
			`^`
Redesign DISTINCT ON as discussed in pgsql-sql 1/25/00: syntax is now SELECT DISTINCT ON (expr [, expr ...]) targetlist ... and there is a check to make sure that the user didn't specify an ORDER BY that's incompatible with the DISTINCT operation. Reimplement nodeUnique and nodeGroup to use the proper datatype-specific equality function for each column being compared --- they used to do bitwise comparisons or convert the data to text strings and strcmp(). (To add insult to injury, they'd look up the conversion functions once for each tuple...) Parse/plan representation of DISTINCT is now a list of SortClause nodes. initdb forced by querytree change... 2000-01-27 19:11:50 +01:00			`SELECT DISTINCT ON (string4, ten) string4, ten, two`
Rearrange core regression tests to reduce cross-script dependencies. The idea behind this patch is to make it possible to run individual test scripts without running the entire core test suite. Making all the scripts completely independent would involve a massive rewrite, and would probably be worse for coverage of things like concurrent DDL. So this patch just does what seems practical with limited changes. The net effect is that any test script can be run after running limited earlier dependencies: * all scripts depend on test_setup * many scripts depend on create_index * other dependencies are few in number, and are documented in the parallel_schedule file. To accomplish this, I chose a small number of commonly-used tables and moved their creation and filling into test_setup. Later scripts are expected not to modify these tables' data contents, for fear of affecting other scripts' results. Also, our former habit of declaring all C functions in one place is now gone in favor of declaring them where they're used, if that's just one script, or in test_setup if necessary. There's more that could be done to remove some of the remaining inter-script dependencies, but significantly more-invasive changes would be needed, and at least for now it doesn't seem worth it. Discussion: https://postgr.es/m/1114748.1640383217@sss.pgh.pa.us 2022-02-08 21:30:38 +01:00			`FROM onek`
Redesign DISTINCT ON as discussed in pgsql-sql 1/25/00: syntax is now SELECT DISTINCT ON (expr [, expr ...]) targetlist ... and there is a check to make sure that the user didn't specify an ORDER BY that's incompatible with the DISTINCT operation. Reimplement nodeUnique and nodeGroup to use the proper datatype-specific equality function for each column being compared --- they used to do bitwise comparisons or convert the data to text strings and strcmp(). (To add insult to injury, they'd look up the conversion functions once for each tuple...) Parse/plan representation of DISTINCT is now a list of SortClause nodes. initdb forced by querytree change... 2000-01-27 19:11:50 +01:00			`ORDER BY string4 using <, ten using >, two using <;`
			`string4 \| ten \| two`
			`---------+-----+-----`
			`AAAAxx \| 9 \| 1`
			`AAAAxx \| 8 \| 0`
			`AAAAxx \| 7 \| 1`
			`AAAAxx \| 6 \| 0`
			`AAAAxx \| 5 \| 1`
			`AAAAxx \| 4 \| 0`
			`AAAAxx \| 3 \| 1`
			`AAAAxx \| 2 \| 0`
			`AAAAxx \| 1 \| 1`
			`AAAAxx \| 0 \| 0`
			`HHHHxx \| 9 \| 1`
			`HHHHxx \| 8 \| 0`
			`HHHHxx \| 7 \| 1`
			`HHHHxx \| 6 \| 0`
			`HHHHxx \| 5 \| 1`
			`HHHHxx \| 4 \| 0`
			`HHHHxx \| 3 \| 1`
			`HHHHxx \| 2 \| 0`
			`HHHHxx \| 1 \| 1`
			`HHHHxx \| 0 \| 0`
			`OOOOxx \| 9 \| 1`
			`OOOOxx \| 8 \| 0`
			`OOOOxx \| 7 \| 1`
			`OOOOxx \| 6 \| 0`
			`OOOOxx \| 5 \| 1`
			`OOOOxx \| 4 \| 0`
			`OOOOxx \| 3 \| 1`
			`OOOOxx \| 2 \| 0`
			`OOOOxx \| 1 \| 1`
			`OOOOxx \| 0 \| 0`
			`VVVVxx \| 9 \| 1`
			`VVVVxx \| 8 \| 0`
			`VVVVxx \| 7 \| 1`
			`VVVVxx \| 6 \| 0`
			`VVVVxx \| 5 \| 1`
			`VVVVxx \| 4 \| 0`
			`VVVVxx \| 3 \| 1`
			`VVVVxx \| 2 \| 0`
			`VVVVxx \| 1 \| 1`
			`VVVVxx \| 0 \| 0`
			`(40 rows)`
More splits and cleanups... Its starting to actually take shape and look as expected... 1997-04-06 10:29:57 +02:00
Fix assertion failure when a SELECT DISTINCT ON expression is volatile. In this case we generate two PathKey references to the expression (one for DISTINCT and one for ORDER BY) and they really need to refer to the same EquivalenceClass. However get_eclass_for_sort_expr was being overly paranoid and creating two different EC's. Correct behavior is to use the SortGroupRef index to decide whether two references to volatile expressions that are equal() (ie textually equivalent) should be considered the same. Backpatch to 8.4. Possibly this should be changed in 8.3 as well, but I'll refrain in the absence of evidence of a visible failure in that branch. Per bug #5049. 2009-09-12 02:04:59 +02:00			`-- bug #5049: early 8.4.x chokes on volatile DISTINCT ON clauses`
			`select distinct on (1) floor(random()) as r, f1 from int4_tbl order by 1,2;`
			`r \| f1`
			`---+-------------`
			`0 \| -2147483647`
			`(1 row)`

Use Limit instead of Unique to implement DISTINCT, when possible When all of the query's DISTINCT pathkeys have been marked as redundant due to EquivalenceClasses existing which contain constants, we can just implement the DISTINCT operation on a query by just limiting the number of returned rows to 1 instead of performing a Unique on all of the matching (duplicate) rows. This applies in cases such as: SELECT DISTINCT col,col2 FROM tab WHERE col = 1 AND col2 = 10; If there are any matching rows, then they must all be {1,10}. There's no point in fetching all of those and running a Unique operator on them to leave only a single row. Here we effectively just find the first row and then stop. We are obviously unable to apply this optimization if either the col = 1 or col2 = 10 were missing from the WHERE clause or if there were any additional columns in the SELECT clause. Such queries are probably not all that common, but detecting when we can apply this optimization amounts to checking if the distinct_pathkeys are NULL, which is very cheap indeed. Nothing is done here to check if the query already has a LIMIT clause. If it does then the plan may end up with 2 Limits nodes. There's no harm in that and it's probably not worth the complexity to unify them into a single Limit node. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvqS0j8RUWRUSgCAXxOqnYjHUXmKwspRj4GzVfOO25ByHA@mail.gmail.com Discussion: https://postgr.es/m/MEYPR01MB7101CD5DA0A07C9DE2B74850A4239@MEYPR01MB7101.ausprd01.prod.outlook.com 2022-10-28 12:04:38 +02:00			`--`
			`-- Test the planner's ability to use a LIMIT 1 instead of a Unique node when`
			`-- all of the distinct_pathkeys have been marked as redundant`
			`--`
			`-- Ensure we also get a LIMIT plan with DISTINCT ON`
			`EXPLAIN (COSTS OFF)`
			`SELECT DISTINCT ON (four) four,two`
			`FROM tenk1 WHERE four = 0 ORDER BY 1;`
			`QUERY PLAN`
			`----------------------------------`
			`Result`
			`-> Limit`
			`-> Seq Scan on tenk1`
			`Filter: (four = 0)`
			`(4 rows)`

			`-- and check the result of the above query is correct`
			`SELECT DISTINCT ON (four) four,two`
			`FROM tenk1 WHERE four = 0 ORDER BY 1;`
			`four \| two`
			`------+-----`
			`0 \| 0`
			`(1 row)`

			`-- Ensure a Sort -> Limit is used when the ORDER BY contains additional cols`
			`EXPLAIN (COSTS OFF)`
			`SELECT DISTINCT ON (four) four,two`
			`FROM tenk1 WHERE four = 0 ORDER BY 1,2;`
			`QUERY PLAN`
			`----------------------------------`
			`Limit`
			`-> Sort`
			`Sort Key: two`
			`-> Seq Scan on tenk1`
			`Filter: (four = 0)`
			`(5 rows)`

			`-- Same again but use a column that is indexed so that we get an index scan`
			`-- then a limit`
			`EXPLAIN (COSTS OFF)`
			`SELECT DISTINCT ON (four) four,hundred`
			`FROM tenk1 WHERE four = 0 ORDER BY 1,2;`
			`QUERY PLAN`
			`-----------------------------------------------------`
			`Result`
			`-> Limit`
			`-> Index Scan using tenk1_hundred on tenk1`
			`Filter: (four = 0)`
			`(4 rows)`