From 8372304e3594a1d67737df779f098d9ae9937603 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Wed, 19 Nov 2014 16:00:24 -0500 Subject: [PATCH] Improve documentation's description of JOIN clauses. In bug #12000, Andreas Kunert complained that the documentation was misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2". That's correct as far as it goes, but the equivalence doesn't hold when you consider three or more tables, since JOIN binds more tightly than comma. I added a to explain this, and ended up rearranging some of the existing text so that the note would make sense in context. In passing, rewrite the description of JOIN USING, which was unnecessarily vague, and hadn't been helped any by somebody's reliance on markup as a substitute for clear writing. (Mostly this involved reintroducing a concrete example that was unaccountably removed by commit 032f3b7e166cfa28.) Back-patch to all supported branches. --- doc/src/sgml/queries.sgml | 183 ++++++++++++++++++++------------------ 1 file changed, 98 insertions(+), 85 deletions(-) diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index c4a8ca8d8a..7dbad462a5 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -118,10 +118,12 @@ FROM table_reference , table_r A table reference can be a table name (possibly schema-qualified), - or a derived table such as a subquery, a table join, or complex - combinations of these. If more than one table reference is listed - in the FROM clause they are cross-joined (see below) - to form the intermediate virtual table that can then be subject to + or a derived table such as a subquery, a JOIN construct, or + complex combinations of these. If more than one table reference is + listed in the FROM clause, the tables are cross-joined + (that is, the Cartesian product of their rows is formed; see below). + The result of the FROM list is an intermediate virtual + table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression. @@ -161,6 +163,16 @@ FROM table_reference , table_r A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. + The general syntax of a joined table is + +T1 join_type T2 join_condition + + Joins of all types can be chained together, or nested: either or + both T1 and + T2 can be joined tables. Parentheses + can be used around JOIN clauses to control the join + order. In the absence of parentheses, JOIN clauses + nest left-to-right. @@ -197,10 +209,28 @@ FROM table_reference , table_r FROM T1 CROSS JOIN T2 is equivalent to - FROM T1, - T2. It is also equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below). + It is also equivalent to + FROM T1, + T2. + + + This latter equivalence does not hold exactly when more than two + tables appear, because JOIN binds more tightly than + comma. For example + FROM T1 CROSS JOIN + T2 INNER JOIN T3 + ON condition + is not the same as + FROM T1, + T2 INNER JOIN T3 + ON condition + because the condition can + reference T1 in the first case but not + the second. + + @@ -240,76 +270,6 @@ FROM table_reference , table_r match, as explained in detail below. - - The ON clause is the most general kind of join - condition: it takes a Boolean value expression of the same - kind as is used in a WHERE clause. A pair of rows - from T1 and T2 match if the - ON expression evaluates to true. - - - - The USING clause allows you to take advantage of - the specific situation where both sides of the join use the - same name for the joining columns. It takes a - comma-separated list of the shared column names - and forms a join using the equals operator. Furthermore, the - output of JOIN USING has one column for each of the - listed columns, followed by the remaining columns from each table. - - - The output column difference between ON and - USING when invoking SELECT * is: - - - - ON - all columns from T1 followed - by all columns from T2 - - - - - USING - all join columns, one copy each - and in the listed order, followed by non-join columns - in T1 followed by non-join columns in - T2 - - - - - Examples provided below - - - - - - - join - natural - - - natural join - - Finally, NATURAL is a shorthand form of - USING: it forms a USING list - consisting of all column names that appear in both - input tables. As with USING, these columns appear - only once in the output table. If there are no common - columns, NATURAL behaves like - CROSS JOIN. - - - - - USING is reasonably safe from column changes - in the joined relations since only the specific columns mentioned - are considered. NATURAL is considerably more problematic - if you are referring to relations only by name (views and tables) - since any schema changes to either relation that cause a new matching - column name to be present will cause the join to consider that new column. - - - The possible types of qualified join are: @@ -387,19 +347,70 @@ FROM table_reference , table_r + + + The ON clause is the most general kind of join + condition: it takes a Boolean value expression of the same + kind as is used in a WHERE clause. A pair of rows + from T1 and T2 match if the + ON expression evaluates to true. + + + + The USING clause is a shorthand that allows you to take + advantage of the specific situation where both sides of the join use + the same name for the joining column(s). It takes a + comma-separated list of the shared column names + and forms a join condition that includes an equality comparison + for each one. For example, joining T1 + and T2 with USING (a, b) produces + the join condition ON T1.a + = T2.a AND T1.b + = T2.b. + + + + Furthermore, the output of JOIN USING suppresses + redundant columns: there is no need to print both of the matched + columns, since they must have equal values. While JOIN + ON produces all columns from T1 followed by all + columns from T2, JOIN USING produces one + output column for each of the listed column pairs (in the listed + order), followed by any remaining columns from T1, + followed by any remaining columns from T2. + + + + + join + natural + + + natural join + + Finally, NATURAL is a shorthand form of + USING: it forms a USING list + consisting of all column names that appear in both + input tables. As with USING, these columns appear + only once in the output table. If there are no common + column names, NATURAL behaves like + CROSS JOIN. + + + + + USING is reasonably safe from column changes + in the joined relations since only the listed columns + are combined. NATURAL is considerably more risky since + any schema changes to either relation that cause a new matching + column name to be present will cause the join to combine that new + column as well. + + - - Joins of all types can be chained together or nested: either or - both T1 and - T2 can be joined tables. Parentheses - can be used around JOIN clauses to control the join - order. In the absence of parentheses, JOIN clauses - nest left-to-right. - - To put this together, assume we have tables t1: @@ -516,6 +527,8 @@ FROM table_reference , table_r clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. + That does not matter with inner joins, but it matters a lot with outer + joins.