postgresql/src/test/regress/sql
Tomas Vondra db0d67db24 Optimize order of GROUP BY keys
When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may be heavily dependent on the order in which the keys are
compared when building the groups. Grouping does not imply any ordering,
so we're allowed to compare the keys in arbitrary order, and a Hash Agg
leverages this. But for Group Agg, we simply compared keys in the order
as specified in the query. This commit explores alternative ordering of
the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of
the keys, and leave the rest to the optimizer. But that might get very
expensive, so we try to pick only a couple interesting orderings based
on both local and global information.

When planning the grouping path, we explore statistics (number of
distinct values, cost of the comparison function) for the keys and
reorder them to minimize comparison costs. Intuitively, it may be better
to perform more expensive comparisons (for complex data types etc.)
last, because maybe the cheaper comparisons will be enough. Similarly,
the higher the cardinality of a key, the lower the probability we’ll
need to compare more keys. The patch generates and costs various
orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query,
some of which may not be known while planning the grouping. E.g. there
may be an explicit ORDER BY clause, or some other ordering-dependent
operation, higher up in the query, and using the same ordering may allow
using either incremental sort or even eliminate the sort entirely.

The patch generates orderings and picks those minimizing the comparison
cost (for various pathkeys), and then adds orderings that might be
useful for operations higher up in the plan (ORDER BY, etc.). Finally,
it always keeps the ordering specified in the query, on the assumption
the user might have additional insights.

This introduces a new GUC enable_group_by_reordering, so that the
optimization may be disabled if needed.

The original patch was proposed by Teodor Sigaev, and later improved and
reworked by Dmitry Dolgov. Reviews by a number of people, including me,
Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu.

Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra
Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu
Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
2022-03-31 01:13:33 +02:00
..
advisory_lock.sql
aggregates.sql Optimize order of GROUP BY keys 2022-03-31 01:13:33 +02:00
alter_generic.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
alter_operator.sql Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers. 2018-01-26 18:25:14 -05:00
alter_table.sql Preparatory test cleanup 2022-03-28 15:22:34 +02:00
amutils.sql Add support for nearest-neighbor (KNN) searches to SP-GiST 2018-09-19 01:54:10 +03:00
arrays.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
async.sql
bit.sql Add bit_count SQL function 2021-03-23 10:13:58 +01:00
bitmapops.sql
boolean.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
box.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
brin_bloom.sql BRIN bloom indexes 2021-03-26 13:35:32 +01:00
brin_multi.sql Fix handling of NaN values in BRIN minmax multi 2021-11-06 01:50:44 +01:00
brin.sql Move test for BRIN HOT behavior to stats.sql 2021-12-11 05:32:35 +01:00
btree_index.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
case.sql Add support for NullIfExpr in eval_const_expressions 2021-04-02 11:01:49 +02:00
char.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
circle.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
cluster.sql Avoid possible crash while finishing up a heap rewrite. 2021-03-23 11:24:16 -04:00
collate.icu.utf8.sql Add option to use ICU as global locale provider 2022-03-17 11:13:16 +01:00
collate.linux.utf8.sql Database-level collation version tracking 2022-02-14 08:27:26 +01:00
collate.sql Improve error checking of CREATE COLLATION options. 2021-07-18 11:08:34 +01:00
combocid.sql Sanitize the term "combo CID" in code comments 2021-03-25 16:08:03 +09:00
comments.sql
compression.sql Remove forced toast recompression in VACUUM FULL/CLUSTER 2021-06-14 09:25:50 +09:00
constraints.sql Add UNIQUE null treatment option 2022-02-03 11:48:21 +01:00
conversion.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
copy2.sql Fix handling of redundant options with COPY for "freeze" and "header" 2020-10-05 09:43:17 +09:00
copy.sql Add header matching mode to COPY FROM 2022-03-30 09:02:31 +02:00
copydml.sql Extend a test case a little 2021-02-26 09:11:15 +01:00
copyselect.sql Revert "psql: Show all query results by default" 2021-04-15 19:42:55 +02:00
create_aggregate.sql Introduce "anycompatible" family of polymorphic types. 2020-03-19 11:43:11 -04:00
create_am.sql Improve handling of SET ACCESS METHOD for ALTER MATERIALIZED VIEW 2022-03-19 19:13:52 +09:00
create_cast.sql
create_function_c.sql Rename create_function_N test scripts for clarity. 2022-02-08 15:40:08 -05:00
create_function_sql.sql Fix collection of typos in the code and the documentation 2022-03-15 11:29:35 +09:00
create_index_spgist.sql Add a planner support function for starts_with(). 2021-11-17 16:54:12 -05:00
create_index.sql Fix memory leak in IndexScan node with reordering 2022-02-14 04:17:04 +03:00
create_misc.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
create_operator.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
create_procedure.sql Reconsider the handling of procedure OUT parameters. 2021-06-10 17:11:36 -04:00
create_role.sql Add tests of the CREATEROLE attribute 2022-01-24 15:34:19 -05:00
create_table_like.sql Extended statistics on expressions 2021-03-27 00:01:11 +01:00
create_table.sql Fix collection of typos in the code and the documentation 2022-03-15 11:29:35 +09:00
create_type.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
create_view.sql Add support for security invoker views. 2022-03-22 10:28:10 +00:00
date.sql Change return type of EXTRACT to numeric 2021-04-06 07:20:42 +02:00
dbsize.sql Teach pg_size_pretty and pg_size_bytes about petabytes 2021-07-09 18:56:00 +12:00
delete.sql
dependency.sql Un-hide most cascaded-drop details in regression test results. 2019-03-24 19:15:37 -04:00
domain.sql Fix assignment to array of domain over composite. 2021-10-19 13:54:45 -04:00
drop_if_exists.sql Introduce the 'force' option for the Drop Database command. 2019-11-13 08:25:33 +05:30
drop_operator.sql
enum.sql Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux). 2018-10-09 12:51:01 +13:00
equivclass.sql Suppress unnecessary RelabelType nodes in more cases. 2020-02-26 18:14:12 -05:00
errors.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
event_trigger.sql Improve handling of dropped objects in pg_event_trigger_ddl_commands() 2021-06-14 14:57:22 +09:00
explain.sql Force track_io_timing off in explain.sql to avoid failures when enabled. 2022-03-12 14:21:40 -08:00
expressions.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
fast_default.sql Don't set a fast default for anything but a plain table 2021-06-18 06:51:12 -04:00
float4.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
float8.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
foreign_data.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
foreign_key.sql Enforce foreign key correctly during cross-partition updates 2022-03-20 18:43:40 +01:00
functional_deps.sql
generated.sql Fix bogus dependency handling for GENERATED expressions. 2022-03-21 14:58:49 -04:00
geometry.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
gin.sql Improve test coverage of ginvacuum.c. 2020-09-01 18:40:43 -04:00
gist.sql Don't use_physical_tlist for an IOS with non-returnable columns. 2022-02-11 15:24:02 -05:00
groupingsets.sql Fix assorted missing logic for GroupingFunc nodes. 2022-03-21 17:44:29 -04:00
guc.sql Disallow setting bogus GUCs within an extension's reserved namespace. 2022-02-21 14:10:43 -05:00
hash_func.sql Fix portability issue in tests from commit ce773f230. 2021-09-03 10:01:02 -04:00
hash_index.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
hash_part.sql Fix typo in test comment. 2020-05-28 12:35:18 +03:00
horology.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
identity.sql Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
incremental_sort.sql Optimize order of GROUP BY keys 2022-03-31 01:13:33 +02:00
index_including_gist.sql Support for INCLUDE attributes in GiST indexes 2019-03-10 11:37:17 +03:00
index_including.sql Support INCLUDE'd columns in SP-GiST. 2021-04-05 18:41:21 -04:00
indexing.sql Raise error on concurrent drop of partitioned index 2020-09-01 13:40:43 -04:00
indirect_toast.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
inet.sql Add test case for abbrev(cidr) 2021-02-11 09:56:14 +01:00
infinite_recurse.sql Paper over regression failures in infinite_recurse() on PPC64 Linux. 2020-10-13 17:44:56 -04:00
inherit.sql Allow ordered partition scans in more cases 2021-08-03 12:25:52 +12:00
init_privs.sql
insert_conflict.sql Allow table-qualified variable names in ON CONFLICT ... WHERE. 2021-04-13 15:39:41 -04:00
insert.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
int2.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
int4.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
int8.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
interval.sql Handle integer overflow in interval justification functions. 2022-02-28 15:36:54 -05:00
join_hash.sql Increase hash_mem_multiplier default to 2.0. 2022-02-16 18:41:52 -08:00
join.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
json_encoding.sql Allow Unicode escapes in any server encoding, not only UTF-8. 2020-03-06 14:17:43 -05:00
json_sqljson.sql SQL/JSON query functions 2022-03-29 16:57:13 -04:00
json.sql Improve reporting for syntax errors in multi-line JSON data. 2021-03-01 16:44:17 -05:00
jsonb_jsonpath.sql Support for ISO 8601 in the jsonpath .datetime() method 2020-09-29 12:00:04 +03:00
jsonb_sqljson.sql SQL/JSON query functions 2022-03-29 16:57:13 -04:00
jsonb.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
jsonpath_encoding.sql Allow Unicode escapes in any server encoding, not only UTF-8. 2020-03-06 14:17:43 -05:00
jsonpath.sql Make JSON path numeric literals more correct 2022-03-28 11:11:39 +02:00
largeobject.sql Extend psql's \lo_list/\dl to be able to print large objects' ACLs. 2022-01-06 13:09:05 -05:00
limit.sql Error out if SKIP LOCKED and WITH TIES are both specified 2021-10-01 18:29:18 -03:00
line.sql Improve test coverage of geometric types 2018-09-26 10:45:21 +02:00
lock.sql Add support for security invoker views. 2022-03-22 10:28:10 +00:00
lseg.sql Improve test coverage of geometric types 2018-09-26 10:45:21 +02:00
macaddr8.sql Add support for EUI-64 MAC addresses as macaddr8 2017-03-15 11:16:25 -04:00
macaddr.sql
matview.sql Really fix the ambiguity in REFRESH MATERIALIZED VIEW CONCURRENTLY. 2021-08-07 13:29:32 -04:00
memoize.sql Increase hash_mem_multiplier default to 2.0. 2022-02-16 18:41:52 -08:00
merge.sql Fix role names in merge.sql regress file 2022-03-28 17:10:36 +02:00
misc_functions.sql Add more regression tests for pg_ls_dir() 2022-03-15 10:52:19 +09:00
misc_sanity.sql Replace explicit PIN entries in pg_depend with an OID range test. 2021-07-15 11:41:47 -04:00
misc.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
money.sql Fix loss of fractional digits for large values in cash_numeric(). 2019-07-26 11:59:00 -04:00
multirangetypes.sql Add range_agg with multirange inputs 2022-03-30 20:16:23 +02:00
mvcc.sql Increment xactCompletionCount during subtransaction abort. 2021-04-06 09:24:50 -07:00
name.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
namespace.sql Clean up duplicate role and schema names in regression tests. 2018-03-15 14:00:31 -04:00
numeric_big.sql Fix corner-case loss of precision in numeric ln(). 2020-03-01 14:49:25 +00:00
numeric.sql Fix corner-case loss of precision in numeric_power(). 2021-10-06 13:16:51 +01:00
numerology.sql Reject trailing junk after numeric literals 2022-02-16 10:37:31 +01:00
object_address.sql Add decoding of sequences to built-in replication 2022-03-24 18:49:27 +01:00
oid.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
oidjoins.sql Build in some knowledge about foreign-key relationships in the catalogs. 2021-02-02 17:11:55 -05:00
opr_sanity.sql SQL/JSON constructors 2022-03-27 17:03:34 -04:00
partition_aggregate.sql Move per-agg and per-trans duplicate finding to the planner. 2020-11-24 10:45:00 +02:00
partition_info.sql Fix crash with pg_partition_root 2019-03-22 17:27:38 +09:00
partition_join.sql Consider fractional paths in generate_orderedappend_paths 2022-01-12 22:27:24 +01:00
partition_prune.sql Change the name of the Result Cache node to Memoize 2021-07-14 12:43:58 +12:00
password.sql Change default of password_encryption to scram-sha-256 2020-06-10 16:42:55 +02:00
path.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
pg_lsn.sql Add +(pg_lsn,numeric) and -(pg_lsn,numeric) operators. 2020-06-30 23:55:07 +09:00
plancache.sql Add generic_plans and custom_plans fields into pg_prepared_statements. 2020-07-20 11:55:50 +09:00
plpgsql.sql Test and document the behavior of initialization cross-refs in plpgsql. 2021-10-29 12:45:33 -04:00
point.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
polygon.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
polymorphism.sql Fix bugs in polymorphic-argument resolution for multiranges. 2021-07-27 15:01:49 -04:00
portals_p2.sql
portals.sql Fix some anomalies with NO SCROLL cursors. 2021-09-10 13:18:32 -04:00
prepare.sql Add more tests for CREATE TABLE AS with WITH NO DATA 2019-02-07 09:21:57 +09:00
prepared_xacts.sql Fix check for conflicting session- vs transaction-level locks. 2021-07-24 18:35:52 -04:00
privileges.sql Remove the ability of a role to administer itself. 2022-03-28 13:38:13 -04:00
psql_crosstab.sql
psql.sql Fix thinko in psql test 2022-01-18 16:53:41 +01:00
publication.sql Allow specifying column lists for logical replication 2022-03-26 01:01:27 +01:00
random.sql Remove gratuitous uses of deprecated SELECT INTO 2021-01-28 14:28:41 +01:00
rangefuncs.sql Fix planner error with pulling up subquery expressions into function RTEs. 2021-10-14 12:43:55 -04:00
rangetypes.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
regex.sql Fix regexp misbehavior with capturing parens inside "{0}". 2021-08-24 16:37:26 -04:00
regproc.sql Implement type regcollation 2020-03-18 21:21:00 +01:00
reindex_catalog.sql Fix rd_firstRelfilenodeSubid for nailed relations, in parallel workers. 2020-09-09 18:50:24 -07:00
reloptions.sql Try to stabilize reloptions test, again. 2022-01-20 23:10:40 +13:00
replica_identity.sql Block ALTER TABLE .. DROP NOT NULL on columns in replica identity index 2021-11-25 15:04:56 +09:00
returning.sql
roleattributes.sql Remove WITH OIDS support, change oid catalog column visibility. 2018-11-20 16:00:17 -08:00
rowsecurity.sql Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
rowtypes.sql Revert applying column aliases to the output of whole-row Vars. 2022-03-17 18:18:05 -04:00
rules.sql Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
sanity_check.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
security_label.sql
select_distinct_on.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
select_distinct.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
select_having.sql
select_implicit.sql Remove gratuitous uses of deprecated SELECT INTO 2021-01-28 14:28:41 +01:00
select_into.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
select_parallel.sql Fix mis-planning of repeated application of a projection. 2021-05-31 12:03:00 -04:00
select_views.sql Avoid locale-dependent output in select_views regression test. 2017-05-28 14:52:18 -04:00
select.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
sequence.sql Make command order in test more sensible 2019-10-22 10:35:54 +02:00
spgist.sql Fix SP-GiST scan initialization logic for binary-compatible cases. 2021-11-20 14:29:56 -05:00
sqljson.sql SQL JSON functions 2022-03-30 16:30:37 -04:00
stats_ext.sql Add stxdinherit flag to pg_statistic_ext_data 2022-01-16 13:38:01 +01:00
stats.sql Mark pg_stat_get_subscription_stats() strict. 2022-03-27 21:47:26 -07:00
strings.sql Let regexp_replace() make use of REG_NOSUB when feasible. 2021-08-09 20:53:25 -04:00
subscription.sql Add ALTER SUBSCRIPTION ... SKIP. 2022-03-22 07:11:19 +05:30
subselect.sql Teach hash_ok_operator() that record_eq is only sometimes hashable. 2022-01-16 16:39:26 -05:00
sysviews.sql Add system view pg_ident_file_mappings 2022-03-29 10:15:48 +09:00
tablesample.sql Fix some anomalies with NO SCROLL cursors. 2021-09-10 13:18:32 -04:00
tablespace.sql Add regression tests for ALTER MATERIALIZED VIEW with tablespaces 2022-03-19 17:28:50 +09:00
temp.sql Fix misbehavior with expression indexes on ON COMMIT DELETE ROWS tables. 2019-12-01 13:09:26 -05:00
test_setup.sql Set synchronous_commit=on in test_setup.sql. 2022-03-12 14:15:25 -08:00
text.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
tid.sql Tighten overflow checks in tidin(). 2022-03-03 20:04:35 -05:00
tidrangescan.sql Add TID Range Scans to support efficient scanning ranges of TIDs 2021-02-27 22:59:36 +13:00
tidscan.sql Fix bug in Tid scan. 2020-02-07 22:06:31 +09:00
time.sql Change return type of EXTRACT to numeric 2021-04-06 07:20:42 +02:00
timestamp.sql Disallow negative strides in date_bin() 2021-07-28 12:10:12 -04:00
timestamptz.sql Support "of", "tzh", and "tzm" format codes. 2022-03-14 16:50:54 -04:00
timetz.sql Change return type of EXTRACT to numeric 2021-04-06 07:20:42 +02:00
transactions.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
triggers.sql Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
truncate.sql Fix TRUNCATE .. CASCADE on partitions 2020-02-07 17:09:36 -03:00
tsdicts.sql Preserve integer and float values accurately in (de)serialize_deflist. 2020-03-10 12:30:02 -04:00
tsearch.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
tsrf.sql Fix handling of targetlist SRFs when scan/join relation is known empty. 2019-03-07 14:22:13 -05:00
tstypes.sql Disallow making an empty lexeme via array_to_tsvector(). 2021-11-06 13:28:53 -04:00
tuplesort.sql Fix some typos, grammar and style in docs and comments 2021-02-24 16:13:17 +09:00
txid.sql Introduce xid8-based functions to replace txid_XXX. 2020-04-07 12:04:32 +12:00
type_sanity.sql Put typtype letters back into consistent order 2022-02-22 10:11:38 +01:00
typed_table.sql Suppress less info in regression tests using DROP CASCADE. 2017-08-01 16:49:23 -04:00
unicode.sql Fix buffer overrun in unicode string normalization with empty input 2021-11-11 15:00:59 +09:00
union.sql Disable anonymous record hash support except in special cases 2021-09-08 09:55:04 +02:00
updatable_views.sql Add support for security invoker views. 2022-03-22 10:28:10 +00:00
update.sql Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists. 2021-05-10 11:02:29 -04:00
uuid.sql Add gen_random_uuid function 2019-07-14 14:30:27 +02:00
vacuum_parallel.sql Don't overlook indexes during parallel VACUUM. 2021-11-02 12:06:17 -07:00
vacuum.sql Try to stabilize vacuum test. 2022-03-23 15:06:25 +13:00
varchar.sql Rearrange core regression tests to reduce cross-script dependencies. 2022-02-08 15:30:38 -05:00
window.sql Add tests for UNBOUNDED syntax ambiguity 2021-07-01 09:27:05 +02:00
with.sql Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
write_parallel.sql Enable parallelism in REFRESH MATERIALIZED VIEW. 2021-03-17 15:04:17 +13:00
xid.sql Add min() and max() aggregates for xid8. 2022-02-10 12:33:41 +09:00
xml.sql Avoid failure when selecting a namespace node in XMLTABLE. 2019-10-25 15:22:45 -04:00
xmlmap.sql Fix incorrect xmlschema output for types timetz and timestamptz. 2022-03-18 16:01:42 -04:00