postgresql/contrib
Tomas Vondra db0d67db24 Optimize order of GROUP BY keys
When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may be heavily dependent on the order in which the keys are
compared when building the groups. Grouping does not imply any ordering,
so we're allowed to compare the keys in arbitrary order, and a Hash Agg
leverages this. But for Group Agg, we simply compared keys in the order
as specified in the query. This commit explores alternative ordering of
the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of
the keys, and leave the rest to the optimizer. But that might get very
expensive, so we try to pick only a couple interesting orderings based
on both local and global information.

When planning the grouping path, we explore statistics (number of
distinct values, cost of the comparison function) for the keys and
reorder them to minimize comparison costs. Intuitively, it may be better
to perform more expensive comparisons (for complex data types etc.)
last, because maybe the cheaper comparisons will be enough. Similarly,
the higher the cardinality of a key, the lower the probability we’ll
need to compare more keys. The patch generates and costs various
orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query,
some of which may not be known while planning the grouping. E.g. there
may be an explicit ORDER BY clause, or some other ordering-dependent
operation, higher up in the query, and using the same ordering may allow
using either incremental sort or even eliminate the sort entirely.

The patch generates orderings and picks those minimizing the comparison
cost (for various pathkeys), and then adds orderings that might be
useful for operations higher up in the plan (ORDER BY, etc.). Finally,
it always keeps the ordering specified in the query, on the assumption
the user might have additional insights.

This introduces a new GUC enable_group_by_reordering, so that the
optimization may be disabled if needed.

The original patch was proposed by Teodor Sigaev, and later improved and
reworked by Dmitry Dolgov. Reviews by a number of people, including me,
Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu.

Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra
Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu
Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
2022-03-31 01:13:33 +02:00
..
adminpack Use has_privs_for_roles for predefined role checks 2022-03-28 15:10:04 -04:00
amcheck Harden TAP tests that intentionally corrupt page checksums. 2022-03-25 14:23:26 -04:00
auth_delay Disallow setting bogus GUCs within an extension's reserved namespace. 2022-02-21 14:10:43 -05:00
auto_explain Disallow setting bogus GUCs within an extension's reserved namespace. 2022-02-21 14:10:43 -05:00
basebackup_to_shell basebackup_to_shell: Add TAP test. 2022-03-30 15:47:02 -04:00
basic_archive Disallow setting bogus GUCs within an extension's reserved namespace. 2022-02-21 14:10:43 -05:00
bloom Add new block-by-block strategy for CREATE DATABASE. 2022-03-29 11:48:36 -04:00
bool_plperl Fix broken ruleutils support for function TRANSFORM clauses. 2021-01-25 13:03:43 -05:00
btree_gin Fix failure of btree_gin indexscans with "char" type and </<= operators. 2021-08-10 18:10:29 -04:00
btree_gist Fix results of index-only scans on btree_gist char(N) indexes. 2022-01-08 14:54:39 -05:00
citext Enable routine running of citext's UTF8-specific test cases. 2022-01-05 13:30:07 -05:00
cube Add binary I/O capability for cube datatype. 2021-03-06 12:04:05 -05:00
dblink Simplify SRFs using materialize mode in contrib/ modules 2022-03-08 10:12:22 +09:00
dict_int Update copyright for 2022 2022-01-07 19:04:57 -05:00
dict_xsyn Update copyright for 2022 2022-01-07 19:04:57 -05:00
earthdistance Make contrib modules' installation scripts more secure. 2020-08-10 10:44:42 -04:00
file_fdw Add header matching mode to COPY FROM 2022-03-30 09:02:31 +02:00
fuzzystrmatch Update copyright for 2022 2022-01-07 19:04:57 -05:00
hstore Update copyright for 2022 2022-01-07 19:04:57 -05:00
hstore_plperl Make contrib modules' installation scripts more secure. 2020-08-10 10:44:42 -04:00
hstore_plpython plpython: Code cleanup related to removal of Python 2 support. 2022-03-07 18:30:28 -08:00
intagg Make contrib modules' installation scripts more secure. 2020-08-10 10:44:42 -04:00
intarray Update copyright for 2022 2022-01-07 19:04:57 -05:00
isn Update copyright for 2022 2022-01-07 19:04:57 -05:00
jsonb_plperl Expose internal function for converting int64 to numeric 2020-09-09 20:16:28 +02:00
jsonb_plpython plpython: Code cleanup related to removal of Python 2 support. 2022-03-07 18:30:28 -08:00
lo Fix bogus CALLED_AS_TRIGGER() defenses. 2020-04-03 11:24:56 -04:00
ltree Fix default signature length for gist_ltree_ops 2022-03-16 11:41:18 +03:00
ltree_plpython plpython: Code cleanup related to removal of Python 2 support. 2022-03-07 18:30:28 -08:00
oid2name Replace Test::More plans with done_testing 2022-02-11 20:54:44 +01:00
old_snapshot Update copyright for 2022 2022-01-07 19:04:57 -05:00
pageinspect pageinspect: Add more sanity checks to prevent out-of-bound reads 2022-03-27 17:53:40 +09:00
passwordcheck Improve error handling of cryptohash computations 2022-01-11 09:55:16 +09:00
pg_buffercache Remove support for upgrading extensions from "unpackaged" state. 2020-02-19 16:59:14 -05:00
pg_freespacemap Avoid instabilities with the regression tests of pg_freespacemap 2022-03-29 13:52:49 +09:00
pg_prewarm Disallow setting bogus GUCs within an extension's reserved namespace. 2022-02-21 14:10:43 -05:00
pg_stat_statements Use has_privs_for_roles for predefined role checks 2022-03-28 15:10:04 -04:00
pg_surgery Remove xloginsert.h from xlog.h 2022-01-30 12:25:24 -03:00
pg_trgm Disallow setting bogus GUCs within an extension's reserved namespace. 2022-02-21 14:10:43 -05:00
pg_visibility Remove xloginsert.h from xlog.h 2022-01-30 12:25:24 -03:00
pgcrypto pgcrypto: Remove internal padding implementation 2022-03-22 08:58:44 +01:00
pgrowlocks Use has_privs_for_roles for predefined role checks 2022-03-28 15:10:04 -04:00
pgstattuple Update copyright for 2022 2022-01-07 19:04:57 -05:00
postgres_fdw Optimize order of GROUP BY keys 2022-03-31 01:13:33 +02:00
seg Update copyright for 2022 2022-01-07 19:04:57 -05:00
sepgsql Disallow setting bogus GUCs within an extension's reserved namespace. 2022-02-21 14:10:43 -05:00
spi Remove support for upgrading extensions from "unpackaged" state. 2020-02-19 16:59:14 -05:00
sslinfo contrib/sslinfo needs a fix too to make hamerkop happy. 2021-11-07 11:33:53 -05:00
start-scripts Remove contrib/start-scripts/osx/. 2017-11-17 12:53:20 -05:00
tablefunc Remove all traces of tuplestore_donestoring() in the C code 2022-02-17 09:52:02 +09:00
tcn Update copyright for 2022 2022-01-07 19:04:57 -05:00
test_decoding Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
tsm_system_rows Update copyright for 2022 2022-01-07 19:04:57 -05:00
tsm_system_time Update copyright for 2022 2022-01-07 19:04:57 -05:00
unaccent Make update-unicode target work in vpath builds 2022-03-25 09:47:50 +01:00
uuid-ossp Improve error handling of cryptohash computations 2022-01-11 09:55:16 +09:00
vacuumlo Replace Test::More plans with done_testing 2022-02-11 20:54:44 +01:00
xml2 Simplify SRFs using materialize mode in contrib/ modules 2022-03-08 10:12:22 +09:00
contrib-global.mk Respect TEMP_CONFIG when pg_regress_check and friends are called 2016-02-27 12:28:21 -05:00
Makefile Add 'basebackup_to_shell' contrib module. 2022-03-15 13:24:23 -04:00
README Rename 'gmake' to 'make' in docs and recommended commands 2014-02-12 17:29:19 -05:00

The PostgreSQL contrib tree
---------------------------

This subtree contains porting tools, analysis utilities, and plug-in
features that are not part of the core PostgreSQL system, mainly
because they address a limited audience or are too experimental to be
part of the main source tree.  This does not preclude their
usefulness.

User documentation for each module appears in the main SGML
documentation.

When building from the source distribution, these modules are not
built automatically, unless you build the "world" target.  You can
also build and install them all by running "make all" and "make
install" in this directory; or to build and install just one selected
module, do the same in that module's subdirectory.

Some directories supply new user-defined functions, operators, or
types.  To make use of one of these modules, after you have installed
the code you need to register the new SQL objects in the database
system by executing a CREATE EXTENSION command.  In a fresh database,
you can simply do

    CREATE EXTENSION module_name;

See the PostgreSQL documentation for more information about this
procedure.