Improve documentation's description of JOIN clauses.
In bug #12000, Andreas Kunert complained that the documentation was misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2". That's correct as far as it goes, but the equivalence doesn't hold when you consider three or more tables, since JOIN binds more tightly than comma. I added a <note> to explain this, and ended up rearranging some of the existing text so that the note would make sense in context. In passing, rewrite the description of JOIN USING, which was unnecessarily vague, and hadn't been helped any by somebody's reliance on markup as a substitute for clear writing. (Mostly this involved reintroducing a concrete example that was unaccountably removed by commit 032f3b7e166cfa28.) Back-patch to all supported branches.
This commit is contained in:
parent
88fc719263
commit
8372304e35
|
@ -118,10 +118,12 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
|
|||
</synopsis>
|
||||
|
||||
A table reference can be a table name (possibly schema-qualified),
|
||||
or a derived table such as a subquery, a table join, or complex
|
||||
combinations of these. If more than one table reference is listed
|
||||
in the <literal>FROM</> clause they are cross-joined (see below)
|
||||
to form the intermediate virtual table that can then be subject to
|
||||
or a derived table such as a subquery, a <literal>JOIN</> construct, or
|
||||
complex combinations of these. If more than one table reference is
|
||||
listed in the <literal>FROM</> clause, the tables are cross-joined
|
||||
(that is, the Cartesian product of their rows is formed; see below).
|
||||
The result of the <literal>FROM</> list is an intermediate virtual
|
||||
table that can then be subject to
|
||||
transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
|
||||
and <literal>HAVING</> clauses and is finally the result of the
|
||||
overall table expression.
|
||||
|
@ -161,6 +163,16 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
|
|||
A joined table is a table derived from two other (real or
|
||||
derived) tables according to the rules of the particular join
|
||||
type. Inner, outer, and cross-joins are available.
|
||||
The general syntax of a joined table is
|
||||
<synopsis>
|
||||
<replaceable>T1</replaceable> <replaceable>join_type</replaceable> <replaceable>T2</replaceable> <optional> <replaceable>join_condition</replaceable> </optional>
|
||||
</synopsis>
|
||||
Joins of all types can be chained together, or nested: either or
|
||||
both <replaceable>T1</replaceable> and
|
||||
<replaceable>T2</replaceable> can be joined tables. Parentheses
|
||||
can be used around <literal>JOIN</> clauses to control the join
|
||||
order. In the absence of parentheses, <literal>JOIN</> clauses
|
||||
nest left-to-right.
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
|
@ -197,10 +209,28 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
|
|||
<para>
|
||||
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
|
||||
<replaceable>T2</replaceable></literal> is equivalent to
|
||||
<literal>FROM <replaceable>T1</replaceable>,
|
||||
<replaceable>T2</replaceable></literal>. It is also equivalent to
|
||||
<literal>FROM <replaceable>T1</replaceable> INNER JOIN
|
||||
<replaceable>T2</replaceable> ON TRUE</literal> (see below).
|
||||
It is also equivalent to
|
||||
<literal>FROM <replaceable>T1</replaceable>,
|
||||
<replaceable>T2</replaceable></literal>.
|
||||
<note>
|
||||
<para>
|
||||
This latter equivalence does not hold exactly when more than two
|
||||
tables appear, because <literal>JOIN</> binds more tightly than
|
||||
comma. For example
|
||||
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
|
||||
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
|
||||
ON <replaceable>condition</replaceable></literal>
|
||||
is not the same as
|
||||
<literal>FROM <replaceable>T1</replaceable>,
|
||||
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
|
||||
ON <replaceable>condition</replaceable></literal>
|
||||
because the <replaceable>condition</replaceable> can
|
||||
reference <replaceable>T1</replaceable> in the first case but not
|
||||
the second.
|
||||
</para>
|
||||
</note>
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
@ -240,76 +270,6 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
|
|||
<quote>match</quote>, as explained in detail below.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <literal>ON</> clause is the most general kind of join
|
||||
condition: it takes a Boolean value expression of the same
|
||||
kind as is used in a <literal>WHERE</> clause. A pair of rows
|
||||
from <replaceable>T1</> and <replaceable>T2</> match if the
|
||||
<literal>ON</> expression evaluates to true.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <literal>USING</> clause allows you to take advantage of
|
||||
the specific situation where both sides of the join use the
|
||||
same name for the joining columns. It takes a
|
||||
comma-separated list of the shared column names
|
||||
and forms a join using the equals operator. Furthermore, the
|
||||
output of <literal>JOIN USING</> has one column for each of the
|
||||
listed columns, followed by the remaining columns from each table.
|
||||
</para>
|
||||
|
||||
<para>The output column difference between <literal>ON</> and
|
||||
<literal>USING</> when invoking <literal>SELECT *</> is:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>ON</> - all columns from <replaceable>T1</> followed
|
||||
by all columns from <replaceable>T2</>
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>USING</> - all join columns, one copy each
|
||||
and in the listed order, followed by non-join columns
|
||||
in <replaceable>T1</> followed by non-join columns in
|
||||
<replaceable>T2</>
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Examples provided below
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
<indexterm>
|
||||
<primary>join</primary>
|
||||
<secondary>natural</secondary>
|
||||
</indexterm>
|
||||
<indexterm>
|
||||
<primary>natural join</primary>
|
||||
</indexterm>
|
||||
Finally, <literal>NATURAL</> is a shorthand form of
|
||||
<literal>USING</>: it forms a <literal>USING</> list
|
||||
consisting of all column names that appear in both
|
||||
input tables. As with <literal>USING</>, these columns appear
|
||||
only once in the output table. If there are no common
|
||||
columns, <literal>NATURAL</literal> behaves like
|
||||
<literal>CROSS JOIN</literal>.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
<literal>USING</literal> is reasonably safe from column changes
|
||||
in the joined relations since only the specific columns mentioned
|
||||
are considered. <literal>NATURAL</> is considerably more problematic
|
||||
if you are referring to relations only by name (views and tables)
|
||||
since any schema changes to either relation that cause a new matching
|
||||
column name to be present will cause the join to consider that new column.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
The possible types of qualified join are:
|
||||
|
||||
|
@ -387,19 +347,70 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
|
|||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <literal>ON</> clause is the most general kind of join
|
||||
condition: it takes a Boolean value expression of the same
|
||||
kind as is used in a <literal>WHERE</> clause. A pair of rows
|
||||
from <replaceable>T1</> and <replaceable>T2</> match if the
|
||||
<literal>ON</> expression evaluates to true.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <literal>USING</> clause is a shorthand that allows you to take
|
||||
advantage of the specific situation where both sides of the join use
|
||||
the same name for the joining column(s). It takes a
|
||||
comma-separated list of the shared column names
|
||||
and forms a join condition that includes an equality comparison
|
||||
for each one. For example, joining <replaceable>T1</>
|
||||
and <replaceable>T2</> with <literal>USING (a, b)</> produces
|
||||
the join condition <literal>ON <replaceable>T1</>.a
|
||||
= <replaceable>T2</>.a AND <replaceable>T1</>.b
|
||||
= <replaceable>T2</>.b</literal>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Furthermore, the output of <literal>JOIN USING</> suppresses
|
||||
redundant columns: there is no need to print both of the matched
|
||||
columns, since they must have equal values. While <literal>JOIN
|
||||
ON</> produces all columns from <replaceable>T1</> followed by all
|
||||
columns from <replaceable>T2</>, <literal>JOIN USING</> produces one
|
||||
output column for each of the listed column pairs (in the listed
|
||||
order), followed by any remaining columns from <replaceable>T1</>,
|
||||
followed by any remaining columns from <replaceable>T2</>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<indexterm>
|
||||
<primary>join</primary>
|
||||
<secondary>natural</secondary>
|
||||
</indexterm>
|
||||
<indexterm>
|
||||
<primary>natural join</primary>
|
||||
</indexterm>
|
||||
Finally, <literal>NATURAL</> is a shorthand form of
|
||||
<literal>USING</>: it forms a <literal>USING</> list
|
||||
consisting of all column names that appear in both
|
||||
input tables. As with <literal>USING</>, these columns appear
|
||||
only once in the output table. If there are no common
|
||||
column names, <literal>NATURAL</literal> behaves like
|
||||
<literal>CROSS JOIN</literal>.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
<literal>USING</literal> is reasonably safe from column changes
|
||||
in the joined relations since only the listed columns
|
||||
are combined. <literal>NATURAL</> is considerably more risky since
|
||||
any schema changes to either relation that cause a new matching
|
||||
column name to be present will cause the join to combine that new
|
||||
column as well.
|
||||
</para>
|
||||
</note>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
<para>
|
||||
Joins of all types can be chained together or nested: either or
|
||||
both <replaceable>T1</replaceable> and
|
||||
<replaceable>T2</replaceable> can be joined tables. Parentheses
|
||||
can be used around <literal>JOIN</> clauses to control the join
|
||||
order. In the absence of parentheses, <literal>JOIN</> clauses
|
||||
nest left-to-right.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To put this together, assume we have tables <literal>t1</literal>:
|
||||
<programlisting>
|
||||
|
@ -516,6 +527,8 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
|
|||
clause is processed <emphasis>before</> the join, while
|
||||
a restriction placed in the <literal>WHERE</> clause is processed
|
||||
<emphasis>after</> the join.
|
||||
That does not matter with inner joins, but it matters a lot with outer
|
||||
joins.
|
||||
</para>
|
||||
</sect3>
|
||||
|
||||
|
|
Loading…
Reference in New Issue