postgresql/src/backend/parser
David Rowley 1349d2790b Improve performance of ORDER BY / DISTINCT aggregates
ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been
executed by always performing a sort in nodeAgg.c to sort the tuples in
the current group into the correct order before calling the transition
function on the sorted tuples.  This was not great as often there might be
an index that could have provided pre-sorted input and allowed the
transition functions to be called as the rows come in, rather than having
to store them in a tuplestore in order to sort them once all the tuples
for the group have arrived.

Here we change the planner so it requests a path with a sort order which
supports the most amount of ORDER BY / DISTINCT aggregate functions and
add new code to the executor to allow it to support the processing of
ORDER BY / DISTINCT aggregates where the tuples are already sorted in the
correct order.

Since there can be many ORDER BY / DISTINCT aggregates in any given query
level, it's very possible that we can't find an order that suits all of
these aggregates.  The sort order that the planner chooses is simply the
one that suits the most aggregate functions.  We take the most strictly
sorted variation of each order and see how many aggregate functions can
use that, then we try again with the order of the remaining aggregates to
see if another order would suit more aggregate functions.  For example:

SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...

would request the sort order to be {a, b} because {a} is a subset of the
sort order of {a,b}, but;

SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...

would just pick a plan ordered by {a} (we give precedence to aggregates
which are earlier in the targetlist).

SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ...

would choose to order by {b} since two aggregates suit that vs just one
that requires input ordered by {a}.

Author: David Rowley
Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane
Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
2022-08-02 23:11:45 +12:00
..
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Makefile JSON_TABLE 2022-04-04 16:03:47 -04:00
README Update src/backend/parser/README 2022-07-22 12:56:21 +02:00
analyze.c Make subquery aliases optional in the FROM clause. 2022-07-20 09:29:42 +01:00
check_keywords.pl Update copyright for 2022 2022-01-07 19:04:57 -05:00
gram.y Fix a few issues with REINDEX grammar 2022-07-26 10:16:26 +09:00
parse_agg.c Add support for MERGE SQL command 2022-03-28 16:47:48 +02:00
parse_clause.c Make subquery aliases optional in the FROM clause. 2022-07-20 09:29:42 +01:00
parse_coerce.c Fix failure to validate the result of select_common_type(). 2022-01-29 11:41:18 -05:00
parse_collate.c Pre-beta mechanical code beautification. 2022-05-12 15:17:30 -04:00
parse_cte.c Update copyright for 2022 2022-01-07 19:04:57 -05:00
parse_enr.c Update copyright for 2022 2022-01-07 19:04:57 -05:00
parse_expr.c Improve performance of ORDER BY / DISTINCT aggregates 2022-08-02 23:11:45 +12:00
parse_func.c Improve performance of ORDER BY / DISTINCT aggregates 2022-08-02 23:11:45 +12:00
parse_jsontable.c Tweak detail and hint messages to be consistent with project policy 2022-07-20 09:50:12 +09:00
parse_merge.c Change mechanism to set up source targetlist in MERGE 2022-04-12 09:29:39 +02:00
parse_node.c In transformRowExpr(), check for too many columns in the row. 2022-07-29 13:31:10 -04:00
parse_oper.c Update copyright for 2022 2022-01-07 19:04:57 -05:00
parse_param.c Pre-beta mechanical code beautification. 2022-05-12 15:17:30 -04:00
parse_relation.c Check maximum number of columns in function RTEs, too. 2022-08-01 12:22:35 -04:00
parse_target.c Replace many MemSet calls with struct initialization 2022-07-16 08:50:49 +02:00
parse_type.c Add construct_array_builtin, deconstruct_array_builtin 2022-07-01 11:23:15 +02:00
parse_utilcmd.c Change internal RelFileNode references to RelFileNumber or RelFileLocator. 2022-07-06 11:39:09 -04:00
parser.c SQL/JSON constructors 2022-03-27 17:03:34 -04:00
scan.l Reject trailing junk after numeric literals 2022-02-16 10:37:31 +01:00
scansup.c Update copyright for 2022 2022-01-07 19:04:57 -05:00

README

src/backend/parser/README

Parser
======

This directory does more than tokenize and parse SQL queries.  It also
creates Query structures for the various complex queries that are passed
to the optimizer and then executor.

parser.c	things start here
scan.l		break query into tokens
scansup.c	handle escapes in input strings
gram.y		parse the tokens and produce a "raw" parse tree
analyze.c	top level of parse analysis for optimizable queries
parse_agg.c	handle aggregates, like SUM(col1),  AVG(col2), ...
parse_clause.c	handle clauses like WHERE, ORDER BY, GROUP BY, ...
parse_coerce.c	handle coercing expressions to different data types
parse_collate.c	assign collation information in completed expressions
parse_cte.c	handle Common Table Expressions (WITH clauses)
parse_expr.c	handle expressions like col, col + 3, x = 3 or x = 4
parse_enr.c	handle ephemeral named rels (trigger transition tables, ...)
parse_func.c	handle functions, table.column and column identifiers
parse_jsontable.c handle JSON_TABLE
parse_merge.c	handle MERGE
parse_node.c	create nodes for various structures
parse_oper.c	handle operators in expressions
parse_param.c	handle Params (for the cases used in the core backend)
parse_relation.c support routines for tables and column handling
parse_target.c	handle the result list of the query
parse_type.c	support routines for data type handling
parse_utilcmd.c	parse analysis for utility commands (done at execution time)

See also src/common/keywords.c, which contains the table of standard
keywords and the keyword lookup function.  We separated that out because
various frontend code wants to use it too.