From 21a3857f1f4a9a8c82355f753dfacff752fdebf3 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 22 Jan 2001 23:34:33 +0000 Subject: [PATCH] Rip out table expression section from SQL syntax chapter and develop it into new chapter on query (SELECT) syntax. In the end this should become a narrative and example-filled counterpart to the SELECT reference page. --- doc/src/sgml/filelist.sgml | 4 +- doc/src/sgml/queries.sgml | 819 +++++++++++++++++++++++++++++++++++++ doc/src/sgml/syntax.sgml | 538 +----------------------- doc/src/sgml/user.sgml | 3 +- 4 files changed, 832 insertions(+), 532 deletions(-) create mode 100644 doc/src/sgml/queries.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 799f517144..5d784d7dcc 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -1,4 +1,4 @@ - + @@ -31,7 +31,7 @@ - + diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml new file mode 100644 index 0000000000..e9ede07641 --- /dev/null +++ b/doc/src/sgml/queries.sgml @@ -0,0 +1,819 @@ + + + + Queries + + + A query is the process of or the command to + retrieve data from a database. In SQL the SELECT + command is used to specify queries. The general syntax of the + SELECT command is + +SELECT select_list FROM table_expression sort_specification + + The following sections describe the details of the select list, the + table expression, and the sort specification. The simplest kind of + query has the form + +SELECT * FROM table1; + + Assuming that there is a table called table1, this command would + retrieve all rows and all columns from table1. (The method of + retrieval depends on the client application. For example, the + psql program will display an ASCII-art + table on the screen, client libraries will offer functions to + retrieve individual rows and columns.) The select list + specification * means all columns that the table + expression happens to provide. A select list can also select a + subset of the available columns or even make calculations on the + columns before retrieving them; see . For example, if table1 has columns + named a, b, and c (and perhaps others) you can make the following + query: + +SELECT a, b + c FROM table1; + + (assuming that b and c are of a numeric data type). + + + + FROM table1 is a particularly simple kind of + table expression. In general, table expressions can be complex + constructs of base tables, joins, and subqueries. But you can also + omit the table expression entirely and use the SELECT command as a + calculator: + +SELECT 3 * 4; + + This is more useful if the expressions in the select list return + varying results. For example, you could call a function this way. + +SELECT random(); + + + + + Table Expressions + + + A table expression specifies a table. The + table expression contains a FROM clause that is optionally followed + by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions + simply refer to a table on disk, a so-called base table, but more + complex expressions can be used to modify or combine base tables in + various ways. + + + + The WHERE, GROUP BY, and HAVING clauses in the table expression + specify a pipeline of successive transformations performed on the + table derived in the FROM clause. The final transformed table that + is derived provides the input rows used to derive output rows as + specified by the select list of derived column value expressions. + + + + FROM clause + + + The FROM clause derives a table from one or more other tables + given in a comma-separated table reference list. + +FROM table_reference , table_reference , ... + + + A table reference may be a table name or a derived table such as a + subquery, a table join, or complex combinations of these. If more + than one table reference is listed in the FROM clause they are + CROSS JOINed (see below) to form the derived table that may then + be subject to transformations by the WHERE, GROUP BY, and HAVING + clauses and is finally the result of the overall table expression. + + + + If a table reference is a simple table name and it is the + supertable in a table inheritance hierarchy, rows of the table + include rows from all of its subtable successors unless the + keyword ONLY precedes the table name. + + + + Joined Tables + + + A joined table is a table derived from two other (real or + derived) tables according to the rules of the particular join + type. INNER, OUTER, NATURAL, and CROSS JOIN are supported. + + + + Join Types + + + CROSS JOIN + + + +T1 CROSS JOIN T2 + + + + For each combination of rows from + T1 and + T2 the derived table will contain a + row consisting of all columns in T1 + followed by all columns in T2. If + the tables have have N and M rows respectively, the joined + table will have N * M rows. A cross join is essentially an + INNER JOIN ON TRUE. + + + + + FROM T1 CROSS JOIN + T2 is equivalent to + FROM T1, + T2. + + + + + + + Qualified JOINs + + + +T1 { INNER | { LEFT | RIGHT | FULL } OUTER } JOIN T2 ON boolean expression +T1 { INNER | { LEFT | RIGHT | FULL } OUTER } JOIN T2 USING ( join column list ) + + + + The words INNER and OUTER are + optional for all JOINs. INNER is the default; + LEFT, RIGHT, and + FULL are for OUTER JOINs only. + + + + The join condition is specified in the + ON or USING clause. (The meaning of the join condition + depends on the particular join type; see below.) The ON + clause takes a Boolean value expression of the same kind as is + used in a WHERE clause. The USING clause takes a + comma-separated list of column names, which the joined tables + must have in common, and joins the tables on the equality of + those columns as a set, resulting in a joined table having one + column for each common column listed and all of the other + columns from both tables. Thus, USING (a, b, + c) is equivalent to ON (t1.a = t2.a AND + t1.b = t2.b AND t1.c = t2.c) with the exception that + if ON is used there will be two columns a, b, and c in the + result, whereas with USING there will be only one of each. + + + + + INNER JOIN + + + + For each row R1 of T1, the joined table has a row for each + row in T2 that satisfies the join condition with R1. + + + + + + LEFT OUTER JOIN + + + + First, an INNER JOIN is performed. Then, for a row in T1 + that does not satisfy the join condition with any row in + T2, a joined row is returned with NULL values in columns of + T2. Thus, the joined table unconditionally has a row for each + row in T1. + + + + + + RIGHT OUTER JOIN + + + + This is like a left join, only that the result table will + unconditionally have a row for each row in T2. + + + + + + FULL OUTER JOIN + + + + First, an INNER JOIN is performed. Then, for each row in + T1 that does not satisfy the join condition with any row in + T2, a joined row is returned with null values in columns of + T2. Also, for each row of T2 that does not satisfy the + join condition with any row in T1, a joined row with null + values in the columns of T1 is returned. + + + + + + + + + NATURAL JOIN + + + +T1 NATURAL { INNER | { LEFT | RIGHT | FULL } OUTER JOIN T2 + + + A natural join creates a joined table where every pair of matching + column names between the two tables are merged into one column. The + join specification is effectively a USING clause containing all the + common column names and is otherwise like a Qualified JOIN. + + + + + + + Joins of all types can be chained together or nested where either + or both of T1 and + T2 may be JOINed tables. Parenthesis + can be used around JOIN clauses to control the join order which + are otherwise left to right. + + + + + Subqueries + + + Subqueries specifying a derived table must be enclosed in + parenthesis and must be named using an AS + clause. (See .) + + + +FROM (SELECT * FROM table1) AS alias_name + + + + This example is equivalent to FROM table1 AS + alias_name. Many subqueries can be written as table + joins instead. + + + + + Table and Column Aliases + + + A temporary name can be given to tables and complex table + references to be used for references to the derived table in + further processing. This is called a table + alias. + +FROM table_reference AS alias + + Here, alias can be any regular + identifier. The alias becomes the new name of the table + reference for the current query -- it is no longer possible to + refer to the table by the original name (if the table reference + was an ordinary base table). Thus + +SELECT * FROM my_table AS m WHERE my_table.a > 5; + + is not valid SQL syntax. What will happen instead, as a + Postgres extension, is that an implicit + table reference is added to the FROM clause, so the query is + processed as if it was written as + +SELECT * FROM my_table AS m, my_table WHERE my_table.a > 5; + + Table aliases are mainly for notational convenience, but it is + necessary to use them when joining a table to itself, e.g., + +SELECT * FROM my_table AS a CROSS JOIN my_table AS b ... + + Additionally, an alias is required if the table reference is a + subquery. + + + + Parenthesis are used to resolve ambiguities. The following + statement will assign the alias b to the + result of the join, unlike the previous example: + +SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ... + + + + + +FROM table_reference alias + + This form is equivalent the previously treated one; the + AS key word is noise. + + + + +FROM table_reference AS alias ( column1 , column2 , ... ) + + In addition to renaming the table as described above, the columns + of the table are also given temporary names. If less column + aliases are specified than the actual table has columns, the last + columns are not renamed. This syntax is especially useful for + self-joins or subqueries. + + + + + Examples + + + +FROM T1 INNER JOIN T2 USING (C) +FROM T1 LEFT OUTER JOIN T2 USING (C) +FROM (T1 RIGHT OUTER JOIN T2 ON (T1C1=T2C1)) AS DT1 +FROM (T1 FULL OUTER JOIN T2 USING (C)) AS DT1 (DT1C1, DT1C2) + +FROM T1 NATURAL INNER JOIN T2 +FROM T1 NATURAL LEFT OUTER JOIN T2 +FROM T1 NATURAL RIGHT OUTER JOIN T2 +FROM T1 NATURAL FULL OUTER JOIN T2 + +FROM (SELECT * FROM T1) DT1 CROSS JOIN T2, T3 +FROM (SELECT * FROM T1) DT1, T2, T3 + + + Above are some examples of joined tables and complex derived + tables. Notice how the AS clause renames or names a derived + table and how the optional comma-separated list of column names + that follows gives names or renames the columns. The last two + FROM clauses produce the same derived table from T1, T2, and T3. + The AS keyword was omitted in naming the subquery as DT1. The + keywords OUTER and INNER are noise that can be omitted also. + + + + + + + WHERE clause + + + The syntax of the WHERE clause is + +WHERE search condition + + where search condition is any value + expression as defined in that + returns a value of type boolean. + + + + After the processing of the FROM clause is done, each row of the + derived table is checked against the search condition. If the + result of the condition is true, the row is kept in the output + table, otherwise (that is, if the result is false or NULL) it is + discarded. The search condition typically references at least some + column in the table generated in the FROM clause; this is not + required, but otherwise the WHERE clause will be fairly useless. + + + + + Before the implementation of the JOIN syntax, it was necessary to + put the join condition of an inner join in the WHERE clause. For + example, these table expressions are equivalent: + +FROM a, b WHERE a.id = b.id AND b.val > 5 + + and + +FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5 + + or perhaps even + +FROM a NATURAL JOIN b WHERE b.val > 5 + + Which one of these you use is mainly a matter of style. The JOIN + syntax in the FROM clause is probably not as portable to other + products. For outer joins there is no choice in any case: they + must be done in the FROM clause. + + + + +FROM FDT WHERE + C1 > 5 + +FROM FDT WHERE + C1 IN (1, 2, 3) +FROM FDT WHERE + C1 IN (SELECT C1 FROM T2) +FROM FDT WHERE + C1 IN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) + +FROM FDT WHERE + C1 BETWEEN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) AND 100 + +FROM FDT WHERE + EXISTS (SELECT C1 FROM T2 WHERE C2 > FDT.C1) + + + + In the examples above, FDT is the table derived in the FROM + clause. Rows that do not meet the search condition of the where + clause are eliminated from FDT. Notice the use of scalar + subqueries as value expressions (C2 assumed UNIQUE). Just like + any other query, the subqueries can employ complex table + expressions. Notice how FDT is referenced in the subqueries. + Qualifying C1 as FDT.C1 is only necessary if C1 is the name of a + column in the derived input table of the subquery. Qualifying the + column name adds clarity even when it is not needed. The column + naming scope of an outer query extends into its inner queries. + + + + + + GROUP BY and HAVING clauses + + + After passing the WHERE filter, the derived input table may be + subject to grouping, using the GROUP BY clause, and elimination of + group rows using the HAVING clause. + + + +SELECT select_list FROM ... WHERE ... GROUP BY grouping_column_reference , grouping_column_reference... + + + + The GROUP BY clause is used to group together rows in a table that + share the same values in all the columns listed. The order in + which the columns are listed does not matter (as opposed to an + ORDER BY clause). The purpose is to reduce each group of rows + sharing common values into one group row that is representative of + all rows in the group. This is done to eliminate redundancy in + the output and/or obtain aggregates that apply to these groups. + + + + Once a table is grouped, columns that are not included in the + grouping cannot be referenced, except in aggregate expressions, + since a specific value in those columns is ambiguous - which row + in the group should it come from? The grouped-by columns can be + referenced in select list column expressions since they have a + known constant value per group. Aggregate functions on the + ungrouped columns provide values that span the rows of a group, + not of the whole table. For instance, a + sum(sales) on a grouped table by product code + gives the total sales for each product, not the total sales on all + products. The aggregates of the ungrouped columns are + representative of the group, whereas their individual values may + not be. + + + + Example: + +SELECT pid, p.name, (sum(s.units) * p.price) AS sales + FROM products p LEFT JOIN sales s USING ( pid ) + GROUP BY pid, p.name, p.price; + + In this example, the columns pid, p.name, and p.price must be in + the GROUP BY clause since they are referenced in the query select + list. The column s.units does not have to be in the GROUP BY list + since it is only used in an aggregate expression + (sum()), which represents the group of sales + of a product. For each product, a summary row is returned about + all sales of the product. + + + + In strict SQL, GROUP BY can only group by columns of the source + table but Postgres extends this to also allow GROUP BY to group by + select columns in the query select list. Grouping by value + expressions instead of simple column names is also allowed. + + + + +SELECT select_list FROM ... WHERE ... GROUP BY ... HAVING boolean_expression + + If a table has been grouped using a GROUP BY clause, but then only + certain groups are of interest, the HAVING clause can be used, + much like a WHERE clause, to eliminate groups from a grouped + table. For some queries, Postgres allows a HAVING clause to be + used without a GROUP BY and then it acts just like another WHERE + clause, but the point in using HAVING that way is not clear. Since + HAVING operates on groups, only grouped columns can be listed in + the HAVING clause. If selection based on some ungrouped column is + desired, it should be expressed in the WHERE clause. + + + + Example: + +SELECT pid AS "Products", + p.name AS "Over 5000", + (sum(s.units) * (p.price - p.cost)) AS "Past Month Profit" + FROM products p LEFT JOIN sales s USING ( pid ) + WHERE p.date > CURRENT_DATE - INTERVAL '4 weeks' + GROUP BY pid, p.name, p.price, p.cost + HAVING p.price > 5000; + + In the example above, the WHERE clause is selecting rows by a + column that is not grouped, while the HAVING clause + is selecting groups with a price greater than 5000. + + + + + + + Select Lists + + + The table expression in the SELECT command + constructs an intermediate virtual table by possibly combining + tables, views, eliminating rows, grouping, etc. This table is + finally passed on to processing by the select list. The select + list determines which columns of the + intermediate table are retained. The simplest kind of select list + is * which retains all columns that the table + expression produces. Otherwise, a select list is a comma-separated + list of value expressions (as defined in ). For instance, it could be a list of + column names: + +SELECT a, b, c FROM ... + + The columns names a, b, and c are either the actual names of the + columns of table referenced in the FROM clause, or the aliases + given to them as explained in . + The name space available in the select list is the same as in the + WHERE clause (unless grouping is used, in which case it is the same + as in the HAVING clause). If more than one table has a column of + the same name, the table name must also be given, as in + +SELECT tbl1.a, tbl2.b, tbl1.c FROM ... + + (see also ). + + + + If an arbitrary value expression is used in the select list, it + conceptually adds a new virtual column to the returned table. The + value expression is effectively evaluated once for each retrieved + row with real values substituted for any column references. But + the expressions in the select list do not have to reference any + columns in the table expression of the FROM clause; they can be + constant arithmetic expressions as well, for instance. + + + + Column Labels + + + The entries in the select list can be assigned names for further + processing. The further processing in this case is + an optional sort specification and the client application (e.g., + column headers for display). For example: + +SELECT a AS value, b + c AS sum FROM ... + + The AS key word can in fact be omitted. + + + + If no name is chosen, the system assigns a default. For simple + column references, this is the name of the column. For function + calls, this is the name of the function. For complex expressions, + the system will generate a generic name. + + + + + The naming of output columns here is different from that done in + the FROM clause (see ). This + pipeline will in fact allow you to rename the same column twice, + but the name chosen in the select list is the one that will be + passed on. + + + + + + DISTINCT + + + After the select list has been processed, the result table may + optionally be subject to the elimination of duplicates. The + DISTINCT key word is written directly after the + SELECT to enable this: + +SELECT DISTINCT select_list ... + + (Instead of DISTINCT the word ALL + can be used to select the default behavior of retaining all rows.) + + + + Obviously, two rows are considered distinct if they differ in at + least one column value. NULLs are considered equal in this + consideration. + + + + Alternatively, an arbitrary expression can determine what rows are + to be considered distinct: + +SELECT DISTINCT ON (expression , expression ...) select_list ... + + Here expression is an arbitrary value + expression that is evaluated for all rows. A set of rows for + which all the expressions is equal are considered duplicates and + only the first row is kept in the output. Note that the + first row of a set is unpredictable unless the + query is sorted. + + + + The DISTINCT ON clause is not part of the SQL standard and is + sometimes considered bad style because of the indeterminate nature + of its results. With judicious use of GROUP BY and subselects in + FROM the construct can be avoided, but it is very often the much + more convenient alternative. + + + + + + Combining Queries + + + The results of two queries can be combined using the set operations + union, intersection, and difference. The syntax is + +query1 UNION ALL query2 +query1 INTERSECT ALL query2 +query1 EXCEPT ALL query2 + + query1 and + query2 are queries that can use any of + the features discussed up to this point. Set operations can also + be nested and chained, for example + +query1 UNION query2 UNION query3 + + which really says + +(query1 UNION query2) UNION query3 + + + + + UNION effectively appends the result of + query2 to the result of + query1 (although there is no guarantee + that this is the order in which the rows are actually returned) and + eliminates all duplicate rows, in the sense of DISTINCT, unless ALL + is specified. + + + + INTERSECT returns all rows that are both in the + result of query1 and in the result of + query2. Duplicate rows are eliminated + unless ALL is specified. + + + + EXCEPT returns all rows that are in the result + of query1 but not in the result of + query2. Again, duplicates are + eliminated unless ALL is specified. + + + + In order to calculate the union, intersection, or difference of two + queries, the two queries must be union compatible, + which means that they both return the same number of columns, and + that the corresponding columns have compatible data types, as + described in . + + + + + + Sorting Rows + + + After a query has produced an output table (after the select list + has been processed) it can optionally be sorted. If sorting is not + chosen, the rows will be returned in random order. The actual + order in that case will depend on the scan and join plan types and + the order on disk, but it must not be relied on. A particular + ordering can only be guaranteed if the sort step is explicitly + chosen. + + + + The ORDER BY clause specifies the sort order: + +SELECT select_list FROM table_expression ORDER BY column1 ASC | DESC , column2 ASC | DESC ... + + column1, etc., refer to select list + columns: It can either be the name of a column (either the + explicit column label or default name, as explained in ) or the number of a column. Some + examples: + +SELECT a, b FROM table1 ORDER BY a; +SELECT a + b AS sum, c FROM table1 ORDER BY sum; +SELECT a, sum(b) FROM table1 GROUP BY a ORDER BY 1; + + + + + As an extension to the SQL standard, Postgres also allows ordering + by arbitrary expressions: + +SELECT a, b FROM table1 ORDER BY a + b; + + References to column names in the FROM clause that are renamed in + the select list are also allowed: + +SELECT a AS b FROM table1 ORDER BY a; + + But this does not work in queries involving UNION, INTERSECT, or + EXCEPT, and is not portable. + + + + Each column specification may be followed by an optional ASC or + DESC to set the sort direction. ASC is default. Ascending order + puts smaller values first, where smaller is defined + in terms of the < operator. Similarly, + descending order is determined with the > + operator. + + + + If more than one sort column is specified the later entries are + used to sort the rows that are equal under the order imposed by the + earlier sort specifications. + + + + + LIMIT and OFFSET + + +SELECT select_list FROM table_expression ORDER BY sort_spec LIMIT { number | ALL } OFFSET number + + + + LIMIT allows you to retrieve just a portion of the rows that are + generated by the rest of the query. If a limit count is given, no + more than that many rows will be returned. If an offset is given, + that many rows will be skipped before starting to return rows. + + + + When using LIMIT, it is a good idea to use an ORDER BY clause that + constrains the result rows into a unique order. Otherwise you will + get an unpredictable subset of the query's rows---you may be asking + for the tenth through twentieth rows, but tenth through twentieth + in what ordering? The ordering is unknown, unless you specified + ORDER BY. + + + + The query optimizer takes LIMIT into account when generating a + query plan, so you are very likely to get different plans (yielding + different row orders) depending on what you give for LIMIT and + OFFSET. Thus, using different LIMIT/OFFSET values to select + different subsets of a query result will give + inconsistent results unless you enforce a predictable + result ordering with ORDER BY. This is not a bug; it is an + inherent consequence of the fact that SQL does not promise to + deliver the results of a query in any particular order unless ORDER + BY is used to constrain the order. + + + + diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index f994cbe0a0..843f2d08f5 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1,5 +1,5 @@ @@ -743,7 +743,7 @@ CAST ( 'string' AS type ) - In addition to this list, there are a number of contructs that can + In addition to this list, there are a number of constructs that can be classified as an expression but do not follow any general syntax rules. These generally have the semantics of a function or operator and are explained in the appropriate location in string' AS type ) A column can be referenced in the form: -corelation.columnname `['subscript`]' +correlation.columnname `['subscript`]' - corelation is either the name of a + correlation is either the name of a table, an alias for a table defined by means of a FROM clause, or the keyword NEW or OLD. (NEW and OLD can only appear in the action portion of a rule, - while other corelation names can be used in any SQL statement.) - The corelation name can be omitted if the column name is unique + while other correlation names can be used in any SQL statement.) + The correlation name can be omitted if the column name is unique across all the tables being used in the current query. If column is of an array type, then the optional subscript selects a specific @@ -895,8 +895,8 @@ sqrt(2) The precedence and associativity of the operators is hard-wired into the parser. Most operators have the same precedence and are left-associative. This may lead to non-intuitive behavior; for - example the boolean operators "<" and ">" have a different - precedence than the boolean operators "<=" and ">=". Also, + example the Boolean operators "<" and ">" have a different + precedence than the Boolean operators "<=" and ">=". Also, you will sometimes need to add parentheses when using combinations of binary and unary operators. For instance @@ -917,7 +917,7 @@ SELECT (5 &) ~ 6; - OperatorElement + Operator/Element Associativity Description @@ -1057,526 +1057,6 @@ SELECT (5 &) ~ 6; - - - Table Expressions - - - A table expression specifies a table. The - table expression contains a FROM clause that is optionally followed - by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions - simply refer to a table on disk, a so-called base table, but more - complex expressions can be used to modify or combine base tables in - various ways. - - - - The general syntax of the SELECT command is - -SELECT select_list table_expression - - - The select_list is a comma separated - list of value expressions as defined in - that specify the derived columns - of the query output table. Column names in the derived table that - is the result of the table_expression - can be used in the value expressions of - the select_list. - - - - The WHERE, GROUP BY, and HAVING clauses in the table expression - specify a pipeline of successive transformations performed on the - table derived in the FROM clause. The final transformed table that - is derived provides the input rows used to derive output rows as - specified by the select list of derived column value expressions. - - - - FROM clause - - - The FROM clause derives a table from one or more other tables - given in a comma-separated table reference list. - -FROM table_reference , table_reference , ... - - - A table reference may be a table name or a derived table such as a - subquery, a table join, or complex combinations of these. If more - than one table reference is listed in the FROM clause they are - CROSS JOINed (see below) to form the derived table that may then - be subject to transformations by the WHERE, GROUP BY, and HAVING - clauses and is finally the result of the overall table expression. - - - - If a table reference is a simple table name and it is the - supertable in a table inheritance hierarchy, rows of the table - include rows from all of its subtable successors unless the - keyword ONLY precedes the table name. - - - - Joined Tables - - - A joined table is a table derived from two other (real or - derived) tables according to the rules of the particular join - type. INNER, OUTER, NATURAL, and CROSS JOIN are supported. - - - - Join Types - - - CROSS JOIN - - - -T1 CROSS JOIN T2 - - - - For each combination of rows from - T1 and - T2 the derived table will contain a - row consisting of all columns in T1 - followed by all columns in T2. If - the tables have have N and M rows respectively, the joined - table will have N * M rows. A cross join is essentially an - INNER JOIN ON TRUE. - - - - - FROM T1 CROSS JOIN - T2 is equivalent to - FROM T1, - T2. - - - - - - - Qualified JOINs - - - -T1 { INNER | { LEFT | RIGHT | FULL } OUTER } JOIN T2 ON boolean expression -T1 { INNER | { LEFT | RIGHT | FULL } OUTER } JOIN T2 USING ( join column list ) - - - - The words INNER and OUTER are - optional for all JOINs. INNER is the default; - LEFT, RIGHT, and - FULL are for OUTER JOINs only. - - - - The join condition is specified in the - ON or USING clause. (The meaning of the join condition - depends on the particular join type; see below.) The ON - clause takes a boolean value expression of the same kind as is - used in a WHERE clause. The USING clause takes a - comma-separated list of column names, which the joined tables - must have in common, and joins the tables on the equality of - those columns as a set, resulting in a joined table having one - column for each common column listed and all of the other - columns from both tables. Thus, USING (a, b, - c) is equivalent to ON (t1.a = t2.a AND - t1.b = t2.b AND t1.c = t2.c) with the exception that - if ON is used there will be two columns a, b, and c in the - result, whereas with USING there will be only one of each. - - - - - INNER JOIN - - - - For each row R1 of T1, the joined table has a row for each - row in T2 that satisfies the join condition with R1. - - - - - - LEFT OUTER JOIN - - - - First, an INNER JOIN is performed. Then, for a row in T1 - that does not satisfy the join condition with any row in - T2, a joined row is returned with NULL values in columns of - T2. Thus, the joined table unconditionally has a row for each - row in T1. - - - - - - RIGHT OUTER JOIN - - - - This is like a left join, only that the result table will - unconditionally have a row for each row in T2. - - - - - - FULL OUTER JOIN - - - - First, an INNER JOIN is performed. Then, for each row in - T1 that does not satisfy the join condition with any row in - T2, a joined row is returned with null values in columns of - T2. Also, for each row of T2 that does not satisfy the - join condition with any row in T1, a joined row with null - values in the columns of T1 is returned. - - - - - - - - - NATURAL JOIN - - - -T1 NATURAL { INNER | { LEFT | RIGHT | FULL } OUTER JOIN T2 - - - A natural join creates a joined table where every pair of matching - column names between the two tables are merged into one column. The - join specification is effectively a USING clause containing all the - common column names and is otherwise like a Qualified JOIN. - - - - - - - Joins of all types can be chained together or nested where either - or both of T1 and - T2 may be JOINed tables. Parenthesis - can be used around JOIN clauses to control the join order which - are otherwise left to right. - - - - - Subqueries - - - Subqueries specifying a derived table must be enclosed in - parenthesis and must be named using an AS - clause. (See .) - - - -FROM (SELECT * FROM table1) AS alias_name - - - - This example is equivalent to FROM table1 AS - alias_name. Many subquieries can be written as table - joins instead. - - - - - Table and Column Aliases - - - A temporary name can be given to tables and complex table - references to be used for references to the derived table in - further processing. This is called a table - alias. - -FROM table_reference AS alias - - Here, alias can be any regular - identifier. The alias becomes the new name of the table - reference for the current query -- it is no longer possible to - refer to the table by the original name (if the table reference - was an ordinary base table). Thus - -SELECT * FROM my_table AS m WHERE my_table.a > 5; - - is not valid SQL syntax. What will happen instead, as a - Postgres extension, is that an implict - table reference is added to the FROM clause, so the query is - processed as if it was written as - -SELECT * FROM my_table AS m, my_table WHERE my_table.a > 5; - - Table aliases are mainly for notational convenience, but it is - necessary to use them when joining a table to itself, e.g., - -SELECT * FROM my_table AS a CROSS JOIN my_table AS b ... - - Additionally, an alias is required if the table reference is a - subquery. - - - - Parenthesis are used to resolve ambiguities. The following - statement will assign the alias b to the - result of the join, unlike the previous example: - -SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ... - - - - - -FROM table_reference alias - - This form is equivalent the previously treated one; the - AS key word is noise. - - - - -FROM table_reference AS alias ( column1 , column2 , ... ) - - In addition to renaming the table as described above, the columns - of the table are also given temporary names. If less column - aliases are specified than the actual table has columns, the last - columns are not renamed. This syntax is especially useful for - self-joins or subqueries. - - - - - Examples - - - -FROM T1 INNER JOIN T2 USING (C) -FROM T1 LEFT OUTER JOIN T2 USING (C) -FROM (T1 RIGHT OUTER JOIN T2 ON (T1C1=T2C1)) AS DT1 -FROM (T1 FULL OUTER JOIN T2 USING (C)) AS DT1 (DT1C1, DT1C2) - -FROM T1 NATURAL INNER JOIN T2 -FROM T1 NATURAL LEFT OUTER JOIN T2 -FROM T1 NATURAL RIGHT OUTER JOIN T2 -FROM T1 NATURAL FULL OUTER JOIN T2 - -FROM (SELECT * FROM T1) DT1 CROSS JOIN T2, T3 -FROM (SELECT * FROM T1) DT1, T2, T3 - - - Above are some examples of joined tables and complex derived - tables. Notice how the AS clause renames or names a derived - table and how the optional comma-separated list of column names - that follows gives names or renames the columns. The last two - FROM clauses produce the same derived table from T1, T2, and T3. - The AS keyword was omitted in naming the subquery as DT1. The - keywords OUTER and INNER are noise that can be omitted also. - - - - - - - WHERE clause - - - The syntax of the WHERE clause is - -WHERE search condition - - where search condition is any value - expression as defined in that - returns a value of type boolean. - - - - After the processing of the FROM clause is done, each row of the - derived table is checked against the search condition. If the - result of the condition is true, the row is kept in the output - table, otherwise (that is, if the result is false or NULL) it is - discared. The search condition typically references at least some - column in the table generated in the FROM clause; this is not - required, but otherwise the WHERE clause will be fairly useless. - - - - - Before the implementation of the JOIN syntax, it was necessary to - put the join condition of an inner join in the WHERE clause. For - example, these table expressions are equivalent: - -FROM a, b WHERE a.id = b.id AND b.val > 5 - - and - -FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5 - - or perhaps even - -FROM a NATURAL JOIN b WHERE b.val > 5 - - Which one of these you use is mainly a matter of style. The JOIN - syntax in the FROM clause is probably not as portable to other - products. For outer joins there is no choice in any case: they - must be done in the FROM clause. - - - - -FROM FDT WHERE - C1 > 5 - -FROM FDT WHERE - C1 IN (1, 2, 3) -FROM FDT WHERE - C1 IN (SELECT C1 FROM T2) -FROM FDT WHERE - C1 IN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) - -FROM FDT WHERE - C1 BETWEEN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) AND 100 - -FROM FDT WHERE - EXISTS (SELECT C1 FROM T2 WHERE C2 > FDT.C1) - - - - In the examples above, FDT is the table derived in the FROM - clause. Rows that do not meet the search condition of the where - clause are eliminated from FDT. Notice the use of scalar - subqueries as value expressions (C2 assumed UNIQUE). Just like - any other query, the subqueries can employ complex table - expressions. Notice how FDT is referenced in the subqueries. - Qualifying C1 as FDT.C1 is only necessary if C1 is the name of a - column in the derived input table of the subquery. Qualifying the - column name adds clarity even when it is not needed. The column - naming scope of an outer query extends into its inner queries. - - - - - - - GROUP BY and HAVING clauses - - - After passing the WHERE filter, the derived input table may be - subject to grouping, using the GROUP BY clause, and elimination of - group rows using the HAVING clause. (The HAVING clause can also - be used without GROUP BY, but then it is equivalent to the WHERE - clause.) - - - - In standard SQL, the GROUP BY clause takes a list of column names, - that specify a subrow, from the derived input table produced by - the previous WHERE or FROM clause and partitions the table into - groups with duplicate subrows such that within a column of the - subrow, no column value is distinct from other column values. The - resulting derived input table is a special type of table, called a - grouped table, which still contains all columns but only - references to columns of the grouped subrow, and group aggregates, - derived from any of the columns, may appear in derived column - value expressions in the query select list. When deriving an - output table from a query using a grouped input table, each output - row is derived from a corresponding group/partition of the grouped - table. Aggregates computed in a derived output column are - aggregates on the current partition/group of the grouped input - table being processed. Only one output table row results per - group/partition of the grouped input table. - - - - Postgres has extended the GROUP BY clause to allow some - non-standard, but useful behavior. Derived output columns, given - names using an AS clause in the query select list, may appear in - the GROUP BY clause in combination with, or instead of, the input - table column names. Tables may also be grouped by arbitrary - expressions. If output table column names appear in the GROUP BY - list, then the input table is augmented with additional columns of - the output table columns listed in the GROUP BY clause. The value - for each row in the additional columns is computed from the value - expression that defines the output column in the query select - list. The augmented input table is grouped by the column names - listed in the GROUP BY clause. The resulting grouped augmented - input table is then treated according standard SQL GROUP BY - semantics. Only the columns of the unaugmented input table in the - grouped subrow (if any), and group aggregates, derived from any of - the columns of the unaugmented input table, may be referenced in - the value expressions of the derived output columns of the - query. Output columns derived with an aggregate expression cannot - be named in the GROUP BY clause. - - - - A HAVING clause may optionally follow a GROUP BY clause. The - HAVING clause selects or eliminates, depending on which - perspective is taken, groups from the grouped table derived in the - GROUP BY clause that precedes it. The search condition is the - same type of expression allowed in a WHERE clause and may - reference any of the input table column names in the grouped - subrow, but may not reference any others or any named output - columns. When the search condition results in TRUE the group is - retained, otherwise the group is eliminated. - - - - - ORDER BY and LIMIT clauses - - - ORDER BY and LIMIT clauses are not clauses of a table expression. - They are optional clauses that may follow a query expression and - are discussed here because they are commonly used with the - clauses above. - - - - ORDER BY takes a comma-separated list of columns and performs a - cascaded ordering of the table by the columns listed, in the - order listed. The keyword DESC or ASC may follow any column name - or expression in the list to specify descending or ascending - ordering, respectively. Ascending order is the default. The - ORDER BY clause conforms to the SQL standard but is extended in - Postgres. Postgres allows ORDER BY to reference both output - table columns, as named in the select list using the AS clause, - and input table columns, as given by the table derived in the - FROM clause and other previous clauses. Postgres also extends - ORDER BY to allow ordering by arbitrary expressions. If used in a - query with a GROUP BY clause, the ORDER BY clause can only - reference output table column names and grouped input table - columns. - - - - LIMIT is not a standard SQL clause. LIMIT is a Postgres - extension that limits the number of rows that will be returned - from a query. The rows returned by a query using the LIMIT - clause are random if no ORDER BY clause is specified. A LIMIT - clause may optionally be followed by an OFFSET clause which - specifies a number of rows to be skipped in the output table - before returning the number of rows specified in the LIMIT - clause. - - -]]> - - @@ -44,6 +44,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/Attic/user.sgml,v 1.24 2001/01/13 23:58:55 &intro; &syntax; + &queries; &datatype; &func; &typeconv;