From df3e23e1b509001ac9336763404406bdcad04dcc Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Fri, 14 Nov 2014 17:19:29 -0500 Subject: [PATCH] Document evaluation-order considerations for aggregate functions. The SELECT reference page didn't really address the question of when aggregate function evaluation occurs, nor did the "expression evaluation rules" documentation mention that CASE can't be used to control whether an aggregate gets evaluated or not. Improve that. Per discussion of bug #11661. Original text by Marti Raudsepp and Michael Paquier, rewritten significantly by me. --- doc/src/sgml/ref/select.sgml | 36 ++++++++++++++++++++++++++---------- doc/src/sgml/syntax.sgml | 22 ++++++++++++++++++++++ 2 files changed, 48 insertions(+), 10 deletions(-) diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 4948a6d815..f38b42614f 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -109,9 +109,11 @@ TABLE [ ONLY ] table_name [ * ] - If the GROUP BY clause is specified, the + If the GROUP BY clause is specified, + or if there are aggregate function calls, the output is combined into groups of rows that match on one or more - values. If the HAVING clause is present, it + values, and the results of aggregate functions are computed. + If the HAVING clause is present, it eliminates groups that do not satisfy the given condition. (See and below.) @@ -637,17 +639,22 @@ GROUP BY expression [, ...] Aggregate functions, if any are used, are computed across all rows - making up each group, producing a separate value for each group - (whereas without GROUP BY, an aggregate - produces a single value computed across all the selected rows). - The set of rows fed to the aggregate function can be further filtered by + making up each group, producing a separate value for each group. + (If there are aggregate functions but no GROUP BY + clause, the query is treated as having a single group comprising all + the selected rows.) + The set of rows fed to each aggregate function can be further filtered by attaching a FILTER clause to the aggregate function call; see for more information. When a FILTER clause is present, only those rows matching it - are included. - When GROUP BY is present, it is not valid for + are included in the input to that aggregate function. + + + + When GROUP BY is present, + or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to - ungrouped columns except within aggregate functions or if the + ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if @@ -655,6 +662,14 @@ GROUP BY expression [, ...] the table containing the ungrouped column. + + Keep in mind that all aggregate functions are evaluated before + evaluating any scalar expressions in the HAVING + clause or SELECT list. This means that, for example, + a CASE expression cannot be used to skip evaluation of + an aggregate function; see . + + Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE and FOR KEY SHARE cannot be @@ -683,7 +698,8 @@ HAVING condition created by GROUP BY. Each column referenced in condition must unambiguously reference a grouping column, unless the reference - appears within an aggregate function. + appears within an aggregate function or the ungrouped column is + functionally dependent on the grouping columns. diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 2f0680fd0b..399ae07075 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -2426,6 +2426,28 @@ SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END; example, it would be better to sidestep the problem by writing y > 1.5*x instead.) + + + A limitation of this technique is that a CASE cannot + prevent evaluation of an aggregate expression contained within it, + because aggregate expressions are computed before scalar + expressions in a SELECT list or HAVING clause + are considered. For example, the following query can cause a + division-by-zero error despite seemingly having protected against it: + +SELECT CASE WHEN min(employees) > 0 + THEN avg(expenses / employees) + END + FROM departments; + + The min() and avg() aggregates are computed + concurrently over all the input rows, so if any row + has employees equal to zero, the division-by-zero error + will occur before there is any opportunity to test the result of + min(). Instead, use a WHERE + or FILTER clause to prevent problematic input rows from + reaching an aggregate function in the first place. +