2000-03-31 05:27:42 +02:00
|
|
|
<!--
|
2001-01-14 00:58:55 +01:00
|
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/query.sgml,v 1.17 2001/01/13 23:58:55 petere Exp $
|
2000-03-31 05:27:42 +02:00
|
|
|
-->
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<chapter id="query">
|
|
|
|
<title>The Query Language</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
|
|
|
The <productname>Postgres</productname> query language is a variant of
|
2001-01-14 00:58:55 +01:00
|
|
|
the <acronym>SQL</acronym> standard. It
|
|
|
|
has many extensions to <acronym>SQL</acronym> such as an
|
2000-04-07 15:30:58 +02:00
|
|
|
extensible type system,
|
1999-05-20 07:39:29 +02:00
|
|
|
inheritance, functions and production rules. These are
|
2000-04-07 15:30:58 +02:00
|
|
|
features carried over from the original
|
|
|
|
<productname>Postgres</productname> query
|
|
|
|
language, <productname>PostQuel</productname>.
|
|
|
|
This section provides an overview
|
2000-02-15 04:57:02 +01:00
|
|
|
of how to use <productname>Postgres</productname>
|
|
|
|
<acronym>SQL</acronym> to perform simple operations.
|
1999-05-20 07:39:29 +02:00
|
|
|
This manual is only intended to give you an idea of our
|
2000-02-15 04:57:02 +01:00
|
|
|
flavor of <acronym>SQL</acronym> and is in no way a complete tutorial on
|
|
|
|
<acronym>SQL</acronym>. Numerous books have been written on
|
2000-04-07 15:30:58 +02:00
|
|
|
<acronym>SQL92</acronym>, including
|
2000-04-11 07:39:15 +02:00
|
|
|
<xref linkend="MELT93" endterm="MELT93"> and
|
|
|
|
<xref linkend="DATE97" endterm="DATE97">.
|
1999-05-20 07:39:29 +02:00
|
|
|
You should be aware that some language features
|
2001-01-14 00:58:55 +01:00
|
|
|
are extensions to the standard.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-psql">
|
2000-02-15 04:57:02 +01:00
|
|
|
<title>Interactive Monitor</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
1999-05-20 07:39:29 +02:00
|
|
|
In the examples that follow, we assume that you have
|
|
|
|
created the mydb database as described in the previous
|
2000-02-15 04:57:02 +01:00
|
|
|
subsection and have started <application>psql</application>.
|
2001-01-14 00:58:55 +01:00
|
|
|
Examples in this manual can also be found in source distribution
|
|
|
|
in the directory <filename>src/tutorial/</filename>. Refer to the
|
2000-02-15 04:57:02 +01:00
|
|
|
<filename>README</filename> file in that directory for how to use them. To
|
1999-05-20 07:39:29 +02:00
|
|
|
start the tutorial, do the following:
|
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<screen>
|
|
|
|
<prompt>$</prompt> <userinput>cd <replaceable>...</replaceable>/src/tutorial</userinput>
|
|
|
|
<prompt>$</prompt> <userinput>psql -s mydb</userinput>
|
|
|
|
<computeroutput>
|
1998-03-01 09:16:16 +01:00
|
|
|
Welcome to the POSTGRESQL interactive sql monitor:
|
|
|
|
Please read the file COPYRIGHT for copyright terms of POSTGRESQL
|
|
|
|
|
|
|
|
type \? for help on slash commands
|
|
|
|
type \q to quit
|
|
|
|
type \g or terminate with semicolon to execute query
|
|
|
|
You are currently connected to the database: postgres
|
2001-01-14 00:58:55 +01:00
|
|
|
</computeroutput>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<prompt>mydb=></prompt> <userinput>\i basics.sql</userinput>
|
|
|
|
</screen>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
|
|
|
The <literal>\i</literal> command read in queries from the specified
|
|
|
|
files. The <literal>-s</literal> option puts you in single step mode which
|
1999-05-20 07:39:29 +02:00
|
|
|
pauses before sending a query to the backend. Queries
|
2000-02-15 04:57:02 +01:00
|
|
|
in this section are in the file <filename>basics.sql</filename>.
|
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
|
|
|
<application>psql</application>
|
|
|
|
has a variety of <literal>\d</literal> commands for showing system information.
|
1999-05-20 07:39:29 +02:00
|
|
|
Consult these commands for more details;
|
2000-02-15 04:57:02 +01:00
|
|
|
for a listing, type <literal>\?</literal> at the <application>psql</application> prompt.
|
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-concepts">
|
2000-02-15 04:57:02 +01:00
|
|
|
<title>Concepts</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
The fundamental notion in <productname>Postgres</productname> is
|
|
|
|
that of a <firstterm>table</firstterm>, which is a named
|
|
|
|
collection of <firstterm>rows</firstterm>. Each row has the same
|
|
|
|
set of named <firstterm>columns</firstterm>, and each column is of
|
|
|
|
a specific type. Furthermore, each row has a permanent
|
|
|
|
<firstterm>object identifier</firstterm> (<acronym>OID</acronym>)
|
|
|
|
that is unique throughout the database cluster. Historially,
|
|
|
|
tables have been called classes in
|
|
|
|
<productname>Postgres</productname>, rows are object instances,
|
|
|
|
and columns are attributes. This makes sense if you consider the
|
|
|
|
object-relational aspects of the database system, but in this
|
|
|
|
manual we will use the customary <acronym>SQL</acronym>
|
|
|
|
terminology. As previously discussed,
|
|
|
|
tables are grouped into databases, and a collection of databases
|
|
|
|
managed by a single <application>postmaster</application> process
|
|
|
|
constitutes a database cluster.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-table">
|
2001-01-14 00:58:55 +01:00
|
|
|
<title>Creating a New Table</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
You can create a new table by specifying the table
|
|
|
|
name, along with all column names and their types:
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
CREATE TABLE weather (
|
|
|
|
city varchar(80),
|
|
|
|
temp_lo int, -- low temperature
|
|
|
|
temp_hi int, -- high temperature
|
|
|
|
prcp real, -- precipitation
|
|
|
|
date date
|
|
|
|
);
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
</para>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2000-04-07 15:30:58 +02:00
|
|
|
Note that both keywords and identifiers are case-insensitive;
|
|
|
|
identifiers can preserve case by surrounding them with
|
|
|
|
double-quotes as allowed
|
2000-02-15 04:57:02 +01:00
|
|
|
by <acronym>SQL92</acronym>.
|
2000-04-07 15:30:58 +02:00
|
|
|
<productname>Postgres</productname> <acronym>SQL</acronym>
|
|
|
|
supports the usual
|
2000-02-15 04:57:02 +01:00
|
|
|
<acronym>SQL</acronym> types <type>int</type>,
|
|
|
|
<type>float</type>, <type>real</type>, <type>smallint</type>,
|
|
|
|
<type>char(N)</type>,
|
|
|
|
<type>varchar(N)</type>, <type>date</type>, <type>time</type>,
|
|
|
|
and <type>timestamp</type>, as well as other types of general utility and
|
1999-05-20 07:39:29 +02:00
|
|
|
a rich set of geometric types. As we will
|
2000-04-07 15:30:58 +02:00
|
|
|
see later, <productname>Postgres</productname> can be customized
|
|
|
|
with an
|
1999-05-20 07:39:29 +02:00
|
|
|
arbitrary number of
|
|
|
|
user-defined data types. Consequently, type names are
|
|
|
|
not syntactical keywords, except where required to support special
|
2000-02-15 04:57:02 +01:00
|
|
|
cases in the <acronym>SQL92</acronym> standard.
|
2000-04-07 15:30:58 +02:00
|
|
|
So far, the <productname>Postgres</productname>
|
|
|
|
<command>CREATE</command> command
|
1999-05-20 07:39:29 +02:00
|
|
|
looks exactly like
|
|
|
|
the command used to create a table in a traditional
|
|
|
|
relational system. However, we will presently see that
|
2001-01-14 00:58:55 +01:00
|
|
|
tables have properties that are extensions of the
|
1999-05-20 07:39:29 +02:00
|
|
|
relational model.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-populate">
|
2001-01-14 00:58:55 +01:00
|
|
|
<title>Populating a Table with Rows</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
The <command>INSERT</command> statement is used to populate a table with
|
|
|
|
rows:
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
|
|
|
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
|
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2000-06-14 15:12:17 +02:00
|
|
|
You can also use <command>COPY</command> to load large
|
2000-02-15 04:57:02 +01:00
|
|
|
amounts of data from flat (<acronym>ASCII</acronym>) files.
|
2000-04-07 15:30:58 +02:00
|
|
|
This is usually faster because the data is read (or written) as a
|
|
|
|
single atomic
|
1999-05-20 07:39:29 +02:00
|
|
|
transaction directly to or from the target table. An example would be:
|
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
|
|
|
COPY weather FROM '/home/user/weather.txt' USING DELIMITERS '|';
|
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-04-07 15:30:58 +02:00
|
|
|
where the path name for the source file must be available to the
|
|
|
|
backend server
|
1999-05-20 07:39:29 +02:00
|
|
|
machine, not the client, since the backend server reads the file directly.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-query">
|
2001-01-14 00:58:55 +01:00
|
|
|
<title>Querying a Table</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
The <classname>weather</classname> table can be queried with normal relational
|
2000-02-15 04:57:02 +01:00
|
|
|
selection and projection queries. A <acronym>SQL</acronym>
|
2000-04-07 15:30:58 +02:00
|
|
|
<command>SELECT</command>
|
1999-05-20 07:39:29 +02:00
|
|
|
statement is used to do this. The statement is divided into
|
2001-01-14 00:58:55 +01:00
|
|
|
a target list (the part that lists the columns to be
|
1999-05-20 07:39:29 +02:00
|
|
|
returned) and a qualification (the part that specifies
|
|
|
|
any restrictions). For example, to retrieve all the
|
|
|
|
rows of weather, type:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
SELECT * FROM weather;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
1999-05-20 07:39:29 +02:00
|
|
|
and the output should be:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
+--------------+---------+---------+------+------------+
|
|
|
|
|city | temp_lo | temp_hi | prcp | date |
|
|
|
|
+--------------+---------+---------+------+------------+
|
2001-01-14 00:58:55 +01:00
|
|
|
|San Francisco | 46 | 50 | 0.25 | 1994-11-27 |
|
1998-03-01 09:16:16 +01:00
|
|
|
+--------------+---------+---------+------+------------+
|
2001-01-14 00:58:55 +01:00
|
|
|
|San Francisco | 43 | 57 | 0 | 1994-11-29 |
|
1998-03-01 09:16:16 +01:00
|
|
|
+--------------+---------+---------+------+------------+
|
2001-01-14 00:58:55 +01:00
|
|
|
|Hayward | 37 | 54 | | 1994-11-29 |
|
1998-03-01 09:16:16 +01:00
|
|
|
+--------------+---------+---------+------+------------+
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2000-04-07 15:30:58 +02:00
|
|
|
You may specify any arbitrary expressions in the target list. For
|
|
|
|
example, you can do:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
1999-05-20 07:39:29 +02:00
|
|
|
Arbitrary Boolean operators
|
2000-04-07 15:30:58 +02:00
|
|
|
(<command>AND</command>, <command>OR</command> and
|
|
|
|
<command>NOT</command>) are
|
1999-05-20 07:39:29 +02:00
|
|
|
allowed in the qualification of any query. For example,
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT * FROM weather
|
|
|
|
WHERE city = 'San Francisco'
|
|
|
|
AND prcp > 0.0;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
results in:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
+--------------+---------+---------+------+------------+
|
|
|
|
|city | temp_lo | temp_hi | prcp | date |
|
|
|
|
+--------------+---------+---------+------+------------+
|
2001-01-14 00:58:55 +01:00
|
|
|
|San Francisco | 46 | 50 | 0.25 | 1994-11-27 |
|
1998-03-01 09:16:16 +01:00
|
|
|
+--------------+---------+---------+------+------------+
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
1999-05-20 07:39:29 +02:00
|
|
|
As a final note, you can specify that the results of a
|
2000-02-15 04:57:02 +01:00
|
|
|
select can be returned in a <firstterm>sorted order</firstterm>
|
2001-01-14 00:58:55 +01:00
|
|
|
or with duplicate rows removed.
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT DISTINCT city
|
|
|
|
FROM weather
|
|
|
|
ORDER BY city;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-selectinto">
|
2000-02-15 04:57:02 +01:00
|
|
|
<title>Redirecting SELECT Queries</title>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
Any <command>SELECT</command> query can be redirected to a new table
|
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT * INTO TABLE temp FROM weather;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2000-04-07 15:30:58 +02:00
|
|
|
This forms an implicit <command>CREATE</command> command, creating a new
|
2001-01-14 00:58:55 +01:00
|
|
|
table temp with the column names and types specified
|
2000-04-07 15:30:58 +02:00
|
|
|
in the target list of the <command>SELECT INTO</command> command. We can
|
1999-05-20 07:39:29 +02:00
|
|
|
then, of course, perform any operations on the resulting
|
2001-01-14 00:58:55 +01:00
|
|
|
table that we can perform on other tables.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-join">
|
2001-01-14 00:58:55 +01:00
|
|
|
<title>Joins Between Tables</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
Thus far, our queries have only accessed one table at a
|
|
|
|
time. Queries can access multiple tables at once, or
|
|
|
|
access the same table in such a way that multiple
|
|
|
|
rows of the table are being processed at the same
|
|
|
|
time. A query that accesses multiple rows of the
|
|
|
|
same or different tables at one time is called a join
|
1999-05-20 07:39:29 +02:00
|
|
|
query.
|
|
|
|
As an example, say we wish to find all the records that
|
|
|
|
are in the temperature range of other records. In
|
|
|
|
effect, we need to compare the temp_lo and temp_hi
|
2001-01-14 00:58:55 +01:00
|
|
|
columns of each WEATHER row to the temp_lo and
|
|
|
|
temp_hi columns of all other WEATHER columns.
|
2000-02-15 04:57:02 +01:00
|
|
|
<note>
|
|
|
|
<para>
|
1999-05-20 07:39:29 +02:00
|
|
|
This is only a conceptual model. The actual join may
|
2000-04-07 15:30:58 +02:00
|
|
|
be performed in a more efficient manner, but this is invisible
|
|
|
|
to the user.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
|
|
|
</note>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
|
|
|
We can do this with the following query:
|
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
|
|
|
|
W2.city, W2.temp_lo AS low, W2.temp_hi AS high
|
|
|
|
FROM weather W1, weather W2
|
|
|
|
WHERE W1.temp_lo < W2.temp_lo
|
|
|
|
AND W1.temp_hi > W2.temp_hi;
|
|
|
|
|
|
|
|
+--------------+-----+------+---------------+-----+------+
|
|
|
|
|city | low | high | city | low | high |
|
|
|
|
+--------------+-----+------+---------------+-----+------+
|
|
|
|
|San Francisco | 43 | 57 | San Francisco | 46 | 50 |
|
|
|
|
+--------------+-----+------+---------------+-----+------+
|
|
|
|
|San Francisco | 37 | 54 | San Francisco | 46 | 50 |
|
|
|
|
+--------------+-----+------+---------------+-----+------+
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<note>
|
|
|
|
<para>
|
1999-05-20 07:39:29 +02:00
|
|
|
The semantics of such a join are
|
|
|
|
that the qualification
|
|
|
|
is a truth expression defined for the Cartesian product of
|
2001-01-14 00:58:55 +01:00
|
|
|
the tables indicated in the query. For those rows in
|
1999-05-20 07:39:29 +02:00
|
|
|
the Cartesian product for which the qualification is true,
|
2000-02-15 04:57:02 +01:00
|
|
|
<productname>Postgres</productname> computes and returns the
|
1999-05-20 07:39:29 +02:00
|
|
|
values specified in the target list.
|
2000-02-15 04:57:02 +01:00
|
|
|
<productname>Postgres</productname> <acronym>SQL</acronym>
|
1999-05-20 07:39:29 +02:00
|
|
|
does not assign any meaning to
|
|
|
|
duplicate values in such expressions.
|
2000-02-15 04:57:02 +01:00
|
|
|
This means that <productname>Postgres</productname>
|
1999-05-20 07:39:29 +02:00
|
|
|
sometimes recomputes the same target list several times;
|
|
|
|
this frequently happens when Boolean expressions are connected
|
|
|
|
with an "or". To remove such duplicates, you must use
|
2000-04-07 15:30:58 +02:00
|
|
|
the <command>SELECT DISTINCT</command> statement.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
|
|
|
</note>
|
1999-05-20 07:39:29 +02:00
|
|
|
</para>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2000-04-07 15:30:58 +02:00
|
|
|
In this case, both <literal>W1</literal> and
|
2001-01-14 00:58:55 +01:00
|
|
|
<literal>W2</literal> are surrogates for a
|
|
|
|
row of the table weather, and both range over all
|
|
|
|
rows of the table. (In the terminology of most
|
2000-04-07 15:30:58 +02:00
|
|
|
database systems, <literal>W1</literal> and <literal>W2</literal>
|
|
|
|
are known as <firstterm>range variables</firstterm>.)
|
1999-05-20 07:39:29 +02:00
|
|
|
A query can contain an arbitrary number of
|
2001-01-14 00:58:55 +01:00
|
|
|
table names and surrogates.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-update">
|
2000-02-15 04:57:02 +01:00
|
|
|
<title>Updates</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
You can update existing rows using the
|
2000-04-07 15:30:58 +02:00
|
|
|
<command>UPDATE</command> command.
|
1999-05-20 07:39:29 +02:00
|
|
|
Suppose you discover the temperature readings are
|
|
|
|
all off by 2 degrees as of Nov 28, you may update the
|
|
|
|
data as follow:
|
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
UPDATE weather
|
|
|
|
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
|
2001-01-14 00:58:55 +01:00
|
|
|
WHERE date > '1994-11-28';
|
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-delete">
|
2000-02-15 04:57:02 +01:00
|
|
|
<title>Deletions</title>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2000-04-07 15:30:58 +02:00
|
|
|
Deletions are performed using the <command>DELETE</command> command:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
DELETE FROM weather WHERE city = 'Hayward';
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
All weather recording belonging to Hayward are removed.
|
1999-05-20 07:39:29 +02:00
|
|
|
One should be wary of queries of the form
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
|
|
|
DELETE FROM <replaceable>tablename</replaceable>;
|
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-04-07 15:30:58 +02:00
|
|
|
Without a qualification, <command>DELETE</command> will simply
|
2001-01-14 00:58:55 +01:00
|
|
|
remove all rows from the given table, leaving it
|
1999-05-20 07:39:29 +02:00
|
|
|
empty. The system will not request confirmation before
|
|
|
|
doing this.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="query-agg">
|
2000-02-15 04:57:02 +01:00
|
|
|
<title>Using Aggregate Functions</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2000-04-07 15:30:58 +02:00
|
|
|
Like most other relational database products,
|
2000-02-15 04:57:02 +01:00
|
|
|
<productname>PostgreSQL</productname> supports
|
1999-05-20 07:39:29 +02:00
|
|
|
aggregate functions.
|
1999-12-13 18:39:38 +01:00
|
|
|
An aggregate function computes a single result from multiple input rows.
|
|
|
|
For example, there are aggregates to compute the
|
2000-02-15 04:57:02 +01:00
|
|
|
<function>count</function>, <function>sum</function>,
|
|
|
|
<function>avg</function> (average), <function>max</function> (maximum) and
|
2001-01-14 00:58:55 +01:00
|
|
|
<function>min</function> (minimum) over a set of rows.
|
1999-12-13 18:39:38 +01:00
|
|
|
</para>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
1999-12-13 18:39:38 +01:00
|
|
|
It is important to understand the interaction between aggregates and
|
2000-04-07 15:30:58 +02:00
|
|
|
SQL's <command>WHERE</command> and <command>HAVING</command> clauses.
|
|
|
|
The fundamental difference between <command>WHERE</command> and
|
|
|
|
<command>HAVING</command> is this: <command>WHERE</command> selects
|
1999-12-13 18:39:38 +01:00
|
|
|
input rows before groups and aggregates are computed (thus, it controls
|
|
|
|
which rows go into the aggregate computation), whereas
|
2000-04-07 15:30:58 +02:00
|
|
|
<command>HAVING</command> selects group rows after groups and
|
1999-12-13 18:39:38 +01:00
|
|
|
aggregates are computed. Thus, the
|
2000-04-07 15:30:58 +02:00
|
|
|
<command>WHERE</command> clause may not contain aggregate functions;
|
1999-12-13 18:39:38 +01:00
|
|
|
it makes no sense to try to use an aggregate to determine which rows
|
|
|
|
will be inputs to the aggregates. On the other hand,
|
2000-04-07 15:30:58 +02:00
|
|
|
<command>HAVING</command> clauses always contain aggregate functions.
|
|
|
|
(Strictly speaking, you are allowed to write a <command>HAVING</command>
|
1999-12-13 18:39:38 +01:00
|
|
|
clause that doesn't use aggregates, but it's wasteful; the same condition
|
2000-04-07 15:30:58 +02:00
|
|
|
could be used more efficiently at the <command>WHERE</command> stage.)
|
1999-12-13 18:39:38 +01:00
|
|
|
</para>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
1999-12-13 18:39:38 +01:00
|
|
|
As an example, we can find the highest low-temperature reading anywhere
|
|
|
|
with
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT max(temp_lo) FROM weather;
|
2000-02-15 04:57:02 +01:00
|
|
|
</programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-12-22 19:57:50 +01:00
|
|
|
If we want to know what city (or cities) that reading occurred in,
|
1999-12-13 18:39:38 +01:00
|
|
|
we might try
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT city FROM weather WHERE temp_lo = max(temp_lo);
|
2000-02-15 04:57:02 +01:00
|
|
|
</programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-04-07 15:30:58 +02:00
|
|
|
but this will not work since the aggregate
|
|
|
|
<function>max</function> can't be used in
|
|
|
|
<command>WHERE</command>. However, as is often the case the query can be
|
1999-12-13 18:39:38 +01:00
|
|
|
restated to accomplish the intended result; here by using a
|
2000-02-15 04:57:02 +01:00
|
|
|
<firstterm>subselect</firstterm>:
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<programlisting>
|
2000-04-07 15:30:58 +02:00
|
|
|
SELECT city FROM weather
|
|
|
|
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
|
2000-02-15 04:57:02 +01:00
|
|
|
</programlisting>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
1999-12-13 18:39:38 +01:00
|
|
|
This is OK because the sub-select is an independent computation that
|
|
|
|
computes its own aggregate separately from what's happening in the outer
|
|
|
|
select.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
1999-12-13 18:39:38 +01:00
|
|
|
Aggregates are also very useful in combination with
|
2000-04-07 15:30:58 +02:00
|
|
|
<command>GROUP BY</command> clauses. For example, we can get the
|
1999-12-13 18:39:38 +01:00
|
|
|
maximum low temperature observed in each city with
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT city, max(temp_lo)
|
|
|
|
FROM weather
|
|
|
|
GROUP BY city;
|
2000-02-15 04:57:02 +01:00
|
|
|
</programlisting>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
1999-12-13 18:39:38 +01:00
|
|
|
which gives us one output row per city. We can filter these grouped
|
2000-04-07 15:30:58 +02:00
|
|
|
rows using <command>HAVING</command>:
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<programlisting>
|
1999-12-13 18:39:38 +01:00
|
|
|
SELECT city, max(temp_lo)
|
|
|
|
FROM weather
|
|
|
|
GROUP BY city
|
|
|
|
HAVING min(temp_lo) < 0;
|
2000-02-15 04:57:02 +01:00
|
|
|
</programlisting>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
1999-12-13 18:39:38 +01:00
|
|
|
which gives us the same results for only the cities that have some
|
|
|
|
below-zero readings. Finally, if we only care about cities whose
|
2000-04-07 15:30:58 +02:00
|
|
|
names begin with "<literal>P</literal>", we might do
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<programlisting>
|
1999-12-13 18:39:38 +01:00
|
|
|
SELECT city, max(temp_lo)
|
|
|
|
FROM weather
|
|
|
|
WHERE city like 'P%'
|
|
|
|
GROUP BY city
|
|
|
|
HAVING min(temp_lo) < 0;
|
2000-02-15 04:57:02 +01:00
|
|
|
</programlisting>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
1999-12-13 18:39:38 +01:00
|
|
|
Note that we can apply the city-name restriction in
|
2000-04-07 15:30:58 +02:00
|
|
|
<command>WHERE</command>, since it needs no aggregate. This is
|
|
|
|
more efficient than adding the restriction to <command>HAVING</command>,
|
1999-12-13 18:39:38 +01:00
|
|
|
because we avoid doing the grouping and aggregate calculations
|
2000-04-07 15:30:58 +02:00
|
|
|
for all rows that fail the <command>WHERE</command> check.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
2000-02-15 04:57:02 +01:00
|
|
|
</chapter>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
2000-03-31 05:27:42 +02:00
|
|
|
mode:sgml
|
1999-05-20 07:39:29 +02:00
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
2000-03-31 05:27:42 +02:00
|
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
1999-05-20 07:39:29 +02:00
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|