Improve documentation's description of JOIN clauses.

In bug #12000, Andreas Kunert complained that the documentation was
misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2".
That's correct as far as it goes, but the equivalence doesn't hold when
you consider three or more tables, since JOIN binds more tightly than
comma.  I added a <note> to explain this, and ended up rearranging some
of the existing text so that the note would make sense in context.

In passing, rewrite the description of JOIN USING, which was unnecessarily
vague, and hadn't been helped any by somebody's reliance on markup as a
substitute for clear writing.  (Mostly this involved reintroducing a
concrete example that was unaccountably removed by commit 032f3b7e166cfa28.)

Back-patch to all supported branches.
This commit is contained in:
Tom Lane 2014-11-19 16:00:24 -05:00
parent 88fc719263
commit 8372304e35
1 changed files with 98 additions and 85 deletions

View File

@ -118,10 +118,12 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
</synopsis>
A table reference can be a table name (possibly schema-qualified),
or a derived table such as a subquery, a table join, or complex
combinations of these. If more than one table reference is listed
in the <literal>FROM</> clause they are cross-joined (see below)
to form the intermediate virtual table that can then be subject to
or a derived table such as a subquery, a <literal>JOIN</> construct, or
complex combinations of these. If more than one table reference is
listed in the <literal>FROM</> clause, the tables are cross-joined
(that is, the Cartesian product of their rows is formed; see below).
The result of the <literal>FROM</> list is an intermediate virtual
table that can then be subject to
transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
and <literal>HAVING</> clauses and is finally the result of the
overall table expression.
@ -161,6 +163,16 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
A joined table is a table derived from two other (real or
derived) tables according to the rules of the particular join
type. Inner, outer, and cross-joins are available.
The general syntax of a joined table is
<synopsis>
<replaceable>T1</replaceable> <replaceable>join_type</replaceable> <replaceable>T2</replaceable> <optional> <replaceable>join_condition</replaceable> </optional>
</synopsis>
Joins of all types can be chained together, or nested: either or
both <replaceable>T1</replaceable> and
<replaceable>T2</replaceable> can be joined tables. Parentheses
can be used around <literal>JOIN</> clauses to control the join
order. In the absence of parentheses, <literal>JOIN</> clauses
nest left-to-right.
</para>
<variablelist>
@ -197,10 +209,28 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
<para>
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
<replaceable>T2</replaceable></literal> is equivalent to
<literal>FROM <replaceable>T1</replaceable>,
<replaceable>T2</replaceable></literal>. It is also equivalent to
<literal>FROM <replaceable>T1</replaceable> INNER JOIN
<replaceable>T2</replaceable> ON TRUE</literal> (see below).
It is also equivalent to
<literal>FROM <replaceable>T1</replaceable>,
<replaceable>T2</replaceable></literal>.
<note>
<para>
This latter equivalence does not hold exactly when more than two
tables appear, because <literal>JOIN</> binds more tightly than
comma. For example
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
ON <replaceable>condition</replaceable></literal>
is not the same as
<literal>FROM <replaceable>T1</replaceable>,
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
ON <replaceable>condition</replaceable></literal>
because the <replaceable>condition</replaceable> can
reference <replaceable>T1</replaceable> in the first case but not
the second.
</para>
</note>
</para>
</listitem>
</varlistentry>
@ -240,76 +270,6 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
<quote>match</quote>, as explained in detail below.
</para>
<para>
The <literal>ON</> clause is the most general kind of join
condition: it takes a Boolean value expression of the same
kind as is used in a <literal>WHERE</> clause. A pair of rows
from <replaceable>T1</> and <replaceable>T2</> match if the
<literal>ON</> expression evaluates to true.
</para>
<para>
The <literal>USING</> clause allows you to take advantage of
the specific situation where both sides of the join use the
same name for the joining columns. It takes a
comma-separated list of the shared column names
and forms a join using the equals operator. Furthermore, the
output of <literal>JOIN USING</> has one column for each of the
listed columns, followed by the remaining columns from each table.
</para>
<para>The output column difference between <literal>ON</> and
<literal>USING</> when invoking <literal>SELECT *</> is:</para>
<itemizedlist>
<listitem>
<para>
<literal>ON</> - all columns from <replaceable>T1</> followed
by all columns from <replaceable>T2</>
</para>
</listitem>
<listitem>
<para>
<literal>USING</> - all join columns, one copy each
and in the listed order, followed by non-join columns
in <replaceable>T1</> followed by non-join columns in
<replaceable>T2</>
</para>
</listitem>
<listitem>
<para>
Examples provided below
</para>
</listitem>
</itemizedlist>
<para>
<indexterm>
<primary>join</primary>
<secondary>natural</secondary>
</indexterm>
<indexterm>
<primary>natural join</primary>
</indexterm>
Finally, <literal>NATURAL</> is a shorthand form of
<literal>USING</>: it forms a <literal>USING</> list
consisting of all column names that appear in both
input tables. As with <literal>USING</>, these columns appear
only once in the output table. If there are no common
columns, <literal>NATURAL</literal> behaves like
<literal>CROSS JOIN</literal>.
</para>
<note>
<para>
<literal>USING</literal> is reasonably safe from column changes
in the joined relations since only the specific columns mentioned
are considered. <literal>NATURAL</> is considerably more problematic
if you are referring to relations only by name (views and tables)
since any schema changes to either relation that cause a new matching
column name to be present will cause the join to consider that new column.
</para>
</note>
<para>
The possible types of qualified join are:
@ -387,19 +347,70 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
</varlistentry>
</variablelist>
</para>
<para>
The <literal>ON</> clause is the most general kind of join
condition: it takes a Boolean value expression of the same
kind as is used in a <literal>WHERE</> clause. A pair of rows
from <replaceable>T1</> and <replaceable>T2</> match if the
<literal>ON</> expression evaluates to true.
</para>
<para>
The <literal>USING</> clause is a shorthand that allows you to take
advantage of the specific situation where both sides of the join use
the same name for the joining column(s). It takes a
comma-separated list of the shared column names
and forms a join condition that includes an equality comparison
for each one. For example, joining <replaceable>T1</>
and <replaceable>T2</> with <literal>USING (a, b)</> produces
the join condition <literal>ON <replaceable>T1</>.a
= <replaceable>T2</>.a AND <replaceable>T1</>.b
= <replaceable>T2</>.b</literal>.
</para>
<para>
Furthermore, the output of <literal>JOIN USING</> suppresses
redundant columns: there is no need to print both of the matched
columns, since they must have equal values. While <literal>JOIN
ON</> produces all columns from <replaceable>T1</> followed by all
columns from <replaceable>T2</>, <literal>JOIN USING</> produces one
output column for each of the listed column pairs (in the listed
order), followed by any remaining columns from <replaceable>T1</>,
followed by any remaining columns from <replaceable>T2</>.
</para>
<para>
<indexterm>
<primary>join</primary>
<secondary>natural</secondary>
</indexterm>
<indexterm>
<primary>natural join</primary>
</indexterm>
Finally, <literal>NATURAL</> is a shorthand form of
<literal>USING</>: it forms a <literal>USING</> list
consisting of all column names that appear in both
input tables. As with <literal>USING</>, these columns appear
only once in the output table. If there are no common
column names, <literal>NATURAL</literal> behaves like
<literal>CROSS JOIN</literal>.
</para>
<note>
<para>
<literal>USING</literal> is reasonably safe from column changes
in the joined relations since only the listed columns
are combined. <literal>NATURAL</> is considerably more risky since
any schema changes to either relation that cause a new matching
column name to be present will cause the join to combine that new
column as well.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>
<para>
Joins of all types can be chained together or nested: either or
both <replaceable>T1</replaceable> and
<replaceable>T2</replaceable> can be joined tables. Parentheses
can be used around <literal>JOIN</> clauses to control the join
order. In the absence of parentheses, <literal>JOIN</> clauses
nest left-to-right.
</para>
<para>
To put this together, assume we have tables <literal>t1</literal>:
<programlisting>
@ -516,6 +527,8 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
clause is processed <emphasis>before</> the join, while
a restriction placed in the <literal>WHERE</> clause is processed
<emphasis>after</> the join.
That does not matter with inner joins, but it matters a lot with outer
joins.
</para>
</sect3>