2006-10-22 01:12:57 +02:00
|
|
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/query.sgml,v 1.48 2006/10/21 23:12:57 tgl Exp $ -->
|
2000-03-31 05:27:42 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<chapter id="tutorial-sql">
|
|
|
|
<title>The <acronym>SQL</acronym> Language</title>
|
|
|
|
|
|
|
|
<sect1 id="tutorial-sql-intro">
|
|
|
|
<title>Introduction</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
This chapter provides an overview of how to use
|
|
|
|
<acronym>SQL</acronym> to perform simple operations. This
|
|
|
|
tutorial is only intended to give you an introduction and is in no
|
|
|
|
way a complete tutorial on <acronym>SQL</acronym>. Numerous books
|
2002-11-11 21:14:04 +01:00
|
|
|
have been written on <acronym>SQL</acronym>, including <xref
|
2001-11-09 00:34:33 +01:00
|
|
|
linkend="MELT93"> and <xref linkend="DATE97">.
|
2001-11-23 22:08:51 +01:00
|
|
|
You should be aware that some <productname>PostgreSQL</productname>
|
|
|
|
language features are extensions to the standard.
|
2001-09-03 01:27:50 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In the examples that follow, we assume that you have created a
|
2002-01-20 23:19:57 +01:00
|
|
|
database named <literal>mydb</literal>, as described in the previous
|
2006-10-22 01:12:57 +02:00
|
|
|
chapter, and have been able to start <application>psql</application>.
|
2001-09-03 01:27:50 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-11-19 06:37:53 +01:00
|
|
|
Examples in this manual can also be found in the
|
|
|
|
<productname>PostgreSQL</productname> source distribution
|
2004-08-30 23:25:27 +02:00
|
|
|
in the directory <filename>src/tutorial/</filename>. To use those
|
|
|
|
files, first change to that directory and run <application>make</>:
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
|
|
|
|
<prompt>$</prompt> <userinput>make</userinput>
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
This creates the scripts and compiles the C files containing user-defined
|
2006-10-22 01:12:57 +02:00
|
|
|
functions and types. (If you installed a pre-packaged version of
|
|
|
|
<productname>PostgreSQL</productname> rather than building from source,
|
|
|
|
look for a directory named <filename>tutorial</> within the
|
|
|
|
<productname>PostgreSQL</productname> documentation. The <quote>make</>
|
|
|
|
part should already have been done for you.)
|
2004-08-30 23:25:27 +02:00
|
|
|
Then, to start the tutorial, do the following:
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<screen>
|
2006-10-22 01:12:57 +02:00
|
|
|
<prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/tutorial</userinput>
|
2001-01-14 00:58:55 +01:00
|
|
|
<prompt>$</prompt> <userinput>psql -s mydb</userinput>
|
|
|
|
<computeroutput>
|
2001-09-03 01:27:50 +02:00
|
|
|
...
|
2001-01-14 00:58:55 +01:00
|
|
|
</computeroutput>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<prompt>mydb=></prompt> <userinput>\i basics.sql</userinput>
|
|
|
|
</screen>
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
The <literal>\i</literal> command reads in commands from the
|
2001-11-19 06:37:53 +01:00
|
|
|
specified file. The <literal>-s</literal> option puts you in
|
2002-11-11 21:14:04 +01:00
|
|
|
single step mode which pauses before sending each statement to the
|
2001-09-03 01:27:50 +02:00
|
|
|
server. The commands used in this section are in the file
|
|
|
|
<filename>basics.sql</filename>.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
2001-09-03 01:27:50 +02:00
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
|
|
<sect1 id="tutorial-concepts">
|
|
|
|
<title>Concepts</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm><primary>relational database</primary></indexterm>
|
|
|
|
<indexterm><primary>hierarchical database</primary></indexterm>
|
|
|
|
<indexterm><primary>object-oriented database</primary></indexterm>
|
|
|
|
<indexterm><primary>relation</primary></indexterm>
|
|
|
|
<indexterm><primary>table</primary></indexterm>
|
|
|
|
|
2003-06-25 01:27:24 +02:00
|
|
|
<productname>PostgreSQL</productname> is a <firstterm>relational
|
|
|
|
database management system</firstterm> (<acronym>RDBMS</acronym>).
|
2001-09-03 01:27:50 +02:00
|
|
|
That means it is a system for managing data stored in
|
|
|
|
<firstterm>relations</firstterm>. Relation is essentially a
|
|
|
|
mathematical term for <firstterm>table</firstterm>. The notion of
|
|
|
|
storing data in tables is so commonplace today that it might
|
|
|
|
seem inherently obvious, but there are a number of other ways of
|
|
|
|
organizing databases. Files and directories on Unix-like
|
|
|
|
operating systems form an example of a hierarchical database. A
|
|
|
|
more modern development is the object-oriented database.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm><primary>row</primary></indexterm>
|
|
|
|
<indexterm><primary>column</primary></indexterm>
|
|
|
|
|
|
|
|
Each table is a named collection of <firstterm>rows</firstterm>.
|
2001-11-23 22:08:51 +01:00
|
|
|
Each row of a given table has the same set of named
|
|
|
|
<firstterm>columns</firstterm>,
|
2001-09-03 01:27:50 +02:00
|
|
|
and each column is of a specific data type. Whereas columns have
|
|
|
|
a fixed order in each row, it is important to remember that SQL
|
|
|
|
does not guarantee the order of the rows within the table in any
|
2001-11-23 22:08:51 +01:00
|
|
|
way (although they can be explicitly sorted for display).
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2003-08-31 19:32:24 +02:00
|
|
|
<indexterm><primary>database cluster</primary></indexterm>
|
|
|
|
<indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
Tables are grouped into databases, and a collection of databases
|
|
|
|
managed by a single <productname>PostgreSQL</productname> server
|
|
|
|
instance constitutes a database <firstterm>cluster</firstterm>.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<sect1 id="tutorial-table">
|
2001-01-14 00:58:55 +01:00
|
|
|
<title>Creating a New Table</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm zone="tutorial-table">
|
|
|
|
<primary>CREATE TABLE</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
You can create a new table by specifying the table
|
|
|
|
name, along with all column names and their types:
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
CREATE TABLE weather (
|
|
|
|
city varchar(80),
|
|
|
|
temp_lo int, -- low temperature
|
|
|
|
temp_hi int, -- high temperature
|
|
|
|
prcp real, -- precipitation
|
|
|
|
date date
|
|
|
|
);
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
You can enter this into <command>psql</command> with the line
|
|
|
|
breaks. <command>psql</command> will recognize that the command
|
|
|
|
is not terminated until the semicolon.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
White space (i.e., spaces, tabs, and newlines) may be used freely
|
|
|
|
in SQL commands. That means you can type the command aligned
|
|
|
|
differently than above, or even all on one line. Two dashes
|
|
|
|
(<quote><literal>--</literal></quote>) introduce comments.
|
|
|
|
Whatever follows them is ignored up to the end of the line. SQL
|
2001-11-19 06:37:53 +01:00
|
|
|
is case insensitive about key words and identifiers, except
|
2001-09-03 01:27:50 +02:00
|
|
|
when identifiers are double-quoted to preserve the case (not done
|
|
|
|
above).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<type>varchar(80)</type> specifies a data type that can store
|
|
|
|
arbitrary character strings up to 80 characters in length.
|
|
|
|
<type>int</type> is the normal integer type. <type>real</type> is
|
2002-01-07 03:29:15 +01:00
|
|
|
a type for storing single precision floating-point numbers.
|
2001-09-03 01:27:50 +02:00
|
|
|
<type>date</type> should be self-explanatory. (Yes, the column of
|
|
|
|
type <type>date</type> is also named <literal>date</literal>.
|
2004-11-15 07:32:15 +01:00
|
|
|
This may be convenient or confusing — you choose.)
|
1999-05-20 07:39:29 +02:00
|
|
|
</para>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2004-12-17 05:50:32 +01:00
|
|
|
<productname>PostgreSQL</productname> supports the standard
|
2000-02-15 04:57:02 +01:00
|
|
|
<acronym>SQL</acronym> types <type>int</type>,
|
2001-09-03 01:27:50 +02:00
|
|
|
<type>smallint</type>, <type>real</type>, <type>double
|
|
|
|
precision</type>, <type>char(<replaceable>N</>)</type>,
|
|
|
|
<type>varchar(<replaceable>N</>)</type>, <type>date</type>,
|
|
|
|
<type>time</type>, <type>timestamp</type>, and
|
2001-11-19 06:37:53 +01:00
|
|
|
<type>interval</type>, as well as other types of general utility
|
2001-09-03 01:27:50 +02:00
|
|
|
and a rich set of geometric types.
|
|
|
|
<productname>PostgreSQL</productname> can be customized with an
|
|
|
|
arbitrary number of user-defined data types. Consequently, type
|
2003-11-01 02:56:29 +01:00
|
|
|
names are not syntactical key words, except where required to
|
2001-09-03 01:27:50 +02:00
|
|
|
support special cases in the <acronym>SQL</acronym> standard.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The second example will store cities and their associated
|
|
|
|
geographical location:
|
|
|
|
<programlisting>
|
|
|
|
CREATE TABLE cities (
|
|
|
|
name varchar(80),
|
|
|
|
location point
|
|
|
|
);
|
|
|
|
</programlisting>
|
2001-11-19 06:37:53 +01:00
|
|
|
The <type>point</type> type is an example of a
|
2001-09-03 01:27:50 +02:00
|
|
|
<productname>PostgreSQL</productname>-specific data type.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<indexterm>
|
|
|
|
<primary>DROP TABLE</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
Finally, it should be mentioned that if you don't need a table any
|
|
|
|
longer or want to recreate it differently you can remove it using
|
|
|
|
the following command:
|
|
|
|
<synopsis>
|
|
|
|
DROP TABLE <replaceable>tablename</replaceable>;
|
|
|
|
</synopsis>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<sect1 id="tutorial-populate">
|
|
|
|
<title>Populating a Table With Rows</title>
|
|
|
|
|
|
|
|
<indexterm zone="tutorial-populate">
|
|
|
|
<primary>INSERT</primary>
|
|
|
|
</indexterm>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-01-14 00:58:55 +01:00
|
|
|
The <command>INSERT</command> statement is used to populate a table with
|
|
|
|
rows:
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
|
|
|
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
|
|
|
|
</programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
|
2001-11-23 22:08:51 +01:00
|
|
|
Note that all data types use rather obvious input formats.
|
|
|
|
Constants that are not simple numeric values usually must be
|
|
|
|
surrounded by single quotes (<literal>'</>), as in the example.
|
|
|
|
The
|
2002-10-20 07:05:46 +02:00
|
|
|
<type>date</type> type is actually quite flexible in what it
|
2001-09-03 01:27:50 +02:00
|
|
|
accepts, but for this tutorial we will stick to the unambiguous
|
|
|
|
format shown here.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
The <type>point</type> type requires a coordinate pair as input,
|
|
|
|
as shown here:
|
|
|
|
<programlisting>
|
2003-03-13 02:30:29 +01:00
|
|
|
INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
|
2001-09-03 01:27:50 +02:00
|
|
|
</programlisting>
|
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<para>
|
|
|
|
The syntax used so far requires you to remember the order of the
|
|
|
|
columns. An alternative syntax allows you to list the columns
|
|
|
|
explicitly:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
|
|
|
|
VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
|
|
|
|
</programlisting>
|
2001-11-19 06:37:53 +01:00
|
|
|
You can list the columns in a different order if you wish or
|
2001-11-23 22:08:51 +01:00
|
|
|
even omit some columns, e.g., if the precipitation is unknown:
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
|
|
|
INSERT INTO weather (date, city, temp_hi, temp_lo)
|
|
|
|
VALUES ('1994-11-29', 'Hayward', 54, 37);
|
|
|
|
</programlisting>
|
|
|
|
Many developers consider explicitly listing the columns better
|
|
|
|
style than relying on the order implicitly.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Please enter all the commands shown above so you have some data to
|
|
|
|
work with in the following sections.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<indexterm>
|
|
|
|
<primary>COPY</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
You could also have used <command>COPY</command> to load large
|
2001-11-28 21:49:10 +01:00
|
|
|
amounts of data from flat-text files. This is usually faster
|
|
|
|
because the <command>COPY</command> command is optimized for this
|
2001-09-03 01:27:50 +02:00
|
|
|
application while allowing less flexibility than
|
|
|
|
<command>INSERT</command>. An example would be:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
COPY weather FROM '/home/user/weather.txt';
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-11-28 21:49:10 +01:00
|
|
|
where the file name for the source file must be available to the
|
2001-09-03 01:27:50 +02:00
|
|
|
backend server machine, not the client, since the backend server
|
|
|
|
reads the file directly. You can read more about the
|
2005-06-13 04:40:08 +02:00
|
|
|
<command>COPY</command> command in <xref linkend="sql-copy"
|
|
|
|
endterm="sql-copy-title">.
|
1999-05-20 07:39:29 +02:00
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<sect1 id="tutorial-select">
|
2001-01-14 00:58:55 +01:00
|
|
|
<title>Querying a Table</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm><primary>query</primary></indexterm>
|
|
|
|
<indexterm><primary>SELECT</primary></indexterm>
|
|
|
|
|
2001-11-28 21:49:10 +01:00
|
|
|
To retrieve data from a table, the table is
|
2001-09-03 01:27:50 +02:00
|
|
|
<firstterm>queried</firstterm>. An <acronym>SQL</acronym>
|
|
|
|
<command>SELECT</command> statement is used to do this. The
|
|
|
|
statement is divided into a select list (the part that lists the
|
|
|
|
columns to be returned), a table list (the part that lists the
|
|
|
|
tables from which to retrieve the data), and an optional
|
|
|
|
qualification (the part that specifies any restrictions). For
|
2001-11-28 21:49:10 +01:00
|
|
|
example, to retrieve all the rows of table
|
2001-09-03 01:27:50 +02:00
|
|
|
<classname>weather</classname>, type:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
SELECT * FROM weather;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2005-01-08 02:44:08 +01:00
|
|
|
Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
|
2004-08-08 23:33:11 +02:00
|
|
|
<footnote>
|
|
|
|
<para>
|
|
|
|
While <literal>SELECT *</literal> is useful for off-the-cuff
|
2004-12-17 05:50:32 +01:00
|
|
|
queries, it is widely considered bad style in production code,
|
|
|
|
since adding a column to the table would change the results.
|
2004-08-08 23:33:11 +02:00
|
|
|
</para>
|
|
|
|
</footnote>
|
2005-01-08 02:44:08 +01:00
|
|
|
So the same result would be had with:
|
|
|
|
<programlisting>
|
|
|
|
SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
|
|
|
|
</programlisting>
|
|
|
|
|
2004-08-08 23:33:11 +02:00
|
|
|
The output should be:
|
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<screen>
|
|
|
|
city | temp_lo | temp_hi | prcp | date
|
|
|
|
---------------+---------+---------+------+------------
|
|
|
|
San Francisco | 46 | 50 | 0.25 | 1994-11-27
|
|
|
|
San Francisco | 43 | 57 | 0 | 1994-11-29
|
|
|
|
Hayward | 37 | 54 | | 1994-11-29
|
|
|
|
(3 rows)
|
|
|
|
</screen>
|
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<para>
|
2005-01-08 02:44:08 +01:00
|
|
|
You can write expressions, not just simple column references, in the
|
|
|
|
select list. For example, you can do:
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
This should give:
|
|
|
|
<screen>
|
|
|
|
city | temp_avg | date
|
|
|
|
---------------+----------+------------
|
|
|
|
San Francisco | 48 | 1994-11-27
|
|
|
|
San Francisco | 50 | 1994-11-29
|
|
|
|
Hayward | 45 | 1994-11-29
|
|
|
|
(3 rows)
|
|
|
|
</screen>
|
|
|
|
Notice how the <literal>AS</literal> clause is used to relabel the
|
2004-08-30 23:29:12 +02:00
|
|
|
output column. (The <literal>AS</literal> clause is optional.)
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2005-01-08 02:44:08 +01:00
|
|
|
A query can be <quote>qualified</> by adding a <literal>WHERE</>
|
|
|
|
clause that specifies which rows are wanted. The <literal>WHERE</>
|
|
|
|
clause contains a Boolean (truth value) expression, and only rows for
|
|
|
|
which the Boolean expression is true are returned. The usual
|
|
|
|
Boolean operators (<literal>AND</literal>,
|
2001-09-03 01:27:50 +02:00
|
|
|
<literal>OR</literal>, and <literal>NOT</literal>) are allowed in
|
2005-01-08 02:44:08 +01:00
|
|
|
the qualification. For example, the following
|
2001-09-03 01:27:50 +02:00
|
|
|
retrieves the weather of San Francisco on rainy days:
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT * FROM weather
|
2005-01-08 02:44:08 +01:00
|
|
|
WHERE city = 'San Francisco' AND prcp > 0.0;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
Result:
|
|
|
|
<screen>
|
|
|
|
city | temp_lo | temp_hi | prcp | date
|
|
|
|
---------------+---------+---------+------+------------
|
|
|
|
San Francisco | 46 | 50 | 0.25 | 1994-11-27
|
|
|
|
(1 row)
|
|
|
|
</screen>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm><primary>ORDER BY</primary></indexterm>
|
2005-01-08 02:44:08 +01:00
|
|
|
|
|
|
|
You can request that the results of a query
|
|
|
|
be returned in sorted order:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM weather
|
|
|
|
ORDER BY city;
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
city | temp_lo | temp_hi | prcp | date
|
|
|
|
---------------+---------+---------+------+------------
|
|
|
|
Hayward | 37 | 54 | | 1994-11-29
|
|
|
|
San Francisco | 43 | 57 | 0 | 1994-11-29
|
|
|
|
San Francisco | 46 | 50 | 0.25 | 1994-11-27
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
In this example, the sort order isn't fully specified, and so you
|
|
|
|
might get the San Francisco rows in either order. But you'd always
|
|
|
|
get the results shown above if you do
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM weather
|
|
|
|
ORDER BY city, temp_lo;
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm><primary>DISTINCT</primary></indexterm>
|
|
|
|
<indexterm><primary>duplicate</primary></indexterm>
|
|
|
|
|
2005-01-08 02:44:08 +01:00
|
|
|
You can request that duplicate rows be removed from the result of
|
|
|
|
a query:
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT DISTINCT city
|
2005-01-08 02:44:08 +01:00
|
|
|
FROM weather;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<screen>
|
|
|
|
city
|
|
|
|
---------------
|
|
|
|
Hayward
|
|
|
|
San Francisco
|
|
|
|
(2 rows)
|
|
|
|
</screen>
|
2002-10-20 07:05:46 +02:00
|
|
|
|
2005-01-08 02:44:08 +01:00
|
|
|
Here again, the result row ordering might vary.
|
|
|
|
You can ensure consistent results by using <literal>DISTINCT</literal> and
|
|
|
|
<literal>ORDER BY</literal> together:
|
|
|
|
<footnote>
|
|
|
|
<para>
|
|
|
|
In some database systems, including older versions of
|
|
|
|
<productname>PostgreSQL</productname>, the implementation of
|
|
|
|
<literal>DISTINCT</literal> automatically orders the rows and
|
2006-10-22 01:12:57 +02:00
|
|
|
so <literal>ORDER BY</literal> is unnecessary. But this is not
|
2005-01-08 02:44:08 +01:00
|
|
|
required by the SQL standard, and current
|
|
|
|
<productname>PostgreSQL</productname> doesn't guarantee that
|
|
|
|
<literal>DISTINCT</literal> causes the rows to be ordered.
|
|
|
|
</para>
|
|
|
|
</footnote>
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SELECT DISTINCT city
|
|
|
|
FROM weather
|
|
|
|
ORDER BY city;
|
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<sect1 id="tutorial-join">
|
2001-01-14 00:58:55 +01:00
|
|
|
<title>Joins Between Tables</title>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm zone="tutorial-join">
|
|
|
|
<primary>join</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
Thus far, our queries have only accessed one table at a time.
|
|
|
|
Queries can access multiple tables at once, or access the same
|
|
|
|
table in such a way that multiple rows of the table are being
|
|
|
|
processed at the same time. A query that accesses multiple rows
|
|
|
|
of the same or different tables at one time is called a
|
|
|
|
<firstterm>join</firstterm> query. As an example, say you wish to
|
|
|
|
list all the weather records together with the location of the
|
2001-11-19 06:37:53 +01:00
|
|
|
associated city. To do that, we need to compare the city column of
|
2001-09-03 01:27:50 +02:00
|
|
|
each row of the weather table with the name column of all rows in
|
2001-11-19 06:37:53 +01:00
|
|
|
the cities table, and select the pairs of rows where these values match.
|
2000-02-15 04:57:02 +01:00
|
|
|
<note>
|
|
|
|
<para>
|
2004-12-17 05:50:32 +01:00
|
|
|
This is only a conceptual model. The join is usually performed
|
|
|
|
in a more efficient manner than actually comparing each possible
|
|
|
|
pair of rows, but this is invisible to the user.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
|
|
|
</note>
|
2001-09-03 01:27:50 +02:00
|
|
|
This would be accomplished by the following query:
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
SELECT *
|
|
|
|
FROM weather, cities
|
|
|
|
WHERE city = name;
|
|
|
|
</programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<screen>
|
|
|
|
city | temp_lo | temp_hi | prcp | date | name | location
|
|
|
|
---------------+---------+---------+------+------------+---------------+-----------
|
|
|
|
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
|
|
|
|
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
|
|
|
|
(2 rows)
|
|
|
|
</screen>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
|
|
|
</para>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
Observe two things about the result set:
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
There is no result row for the city of Hayward. This is
|
|
|
|
because there is no matching entry in the
|
|
|
|
<classname>cities</classname> table for Hayward, so the join
|
2001-11-19 06:37:53 +01:00
|
|
|
ignores the unmatched rows in the weather table. We will see
|
2001-09-03 01:27:50 +02:00
|
|
|
shortly how this can be fixed.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
There are two columns containing the city name. This is
|
|
|
|
correct because the lists of columns of the
|
|
|
|
<classname>weather</classname> and the
|
2001-11-28 21:49:10 +01:00
|
|
|
<classname>cities</classname> table are concatenated. In
|
2001-09-03 01:27:50 +02:00
|
|
|
practice this is undesirable, though, so you will probably want
|
|
|
|
to list the output columns explicitly rather than using
|
|
|
|
<literal>*</literal>:
|
|
|
|
<programlisting>
|
|
|
|
SELECT city, temp_lo, temp_hi, prcp, date, location
|
|
|
|
FROM weather, cities
|
|
|
|
WHERE city = name;
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<formalpara>
|
|
|
|
<title>Exercise:</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Attempt to find out the semantics of this query when the
|
|
|
|
<literal>WHERE</literal> clause is omitted.
|
|
|
|
</para>
|
|
|
|
</formalpara>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
Since the columns all had different names, the parser
|
2006-10-22 01:12:57 +02:00
|
|
|
automatically found out which table they belong to. If there
|
|
|
|
were duplicate column names in the two tables you'd need to
|
|
|
|
<firstterm>qualify</> the column names to show which one you
|
|
|
|
meant, as in:
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
2001-10-09 20:46:00 +02:00
|
|
|
SELECT weather.city, weather.temp_lo, weather.temp_hi,
|
|
|
|
weather.prcp, weather.date, cities.location
|
2001-09-03 01:27:50 +02:00
|
|
|
FROM weather, cities
|
|
|
|
WHERE cities.name = weather.city;
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
2006-10-22 01:12:57 +02:00
|
|
|
|
|
|
|
It is widely considered good style to qualify all column names
|
|
|
|
in a join query, so that the query won't fail if a duplicate
|
|
|
|
column name is later added to one of the tables.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
Join queries of the kind seen thus far can also be written in this
|
|
|
|
alternative form:
|
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
SELECT *
|
|
|
|
FROM weather INNER JOIN cities ON (weather.city = cities.name);
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
This syntax is not as commonly used as the one above, but we show
|
|
|
|
it here to help you understand the following topics.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
|
|
|
|
|
|
|
|
Now we will figure out how we can get the Hayward records back in.
|
|
|
|
What we want the query to do is to scan the
|
|
|
|
<classname>weather</classname> table and for each row to find the
|
2006-10-22 01:12:57 +02:00
|
|
|
matching <classname>cities</classname> row(s). If no matching row is
|
2001-09-03 01:27:50 +02:00
|
|
|
found we want some <quote>empty values</quote> to be substituted
|
|
|
|
for the <classname>cities</classname> table's columns. This kind
|
|
|
|
of query is called an <firstterm>outer join</firstterm>. (The
|
2001-11-19 06:37:53 +01:00
|
|
|
joins we have seen so far are inner joins.) The command looks
|
2001-09-03 01:27:50 +02:00
|
|
|
like this:
|
|
|
|
|
2001-01-14 00:58:55 +01:00
|
|
|
<programlisting>
|
2001-09-03 01:27:50 +02:00
|
|
|
SELECT *
|
|
|
|
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
|
|
|
|
|
|
|
|
city | temp_lo | temp_hi | prcp | date | name | location
|
|
|
|
---------------+---------+---------+------+------------+---------------+-----------
|
|
|
|
Hayward | 37 | 54 | | 1994-11-29 | |
|
|
|
|
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
|
|
|
|
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
|
|
|
|
(3 rows)
|
2001-01-14 00:58:55 +01:00
|
|
|
</programlisting>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-11-19 06:37:53 +01:00
|
|
|
This query is called a <firstterm>left outer
|
2001-09-03 01:27:50 +02:00
|
|
|
join</firstterm> because the table mentioned on the left of the
|
|
|
|
join operator will have each of its rows in the output at least
|
|
|
|
once, whereas the table on the right will only have those rows
|
2001-11-19 06:37:53 +01:00
|
|
|
output that match some row of the left table. When outputting a
|
2002-11-11 21:14:04 +01:00
|
|
|
left-table row for which there is no right-table match, empty (null)
|
2001-11-19 06:37:53 +01:00
|
|
|
values are substituted for the right-table columns.
|
2001-09-03 01:27:50 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<formalpara>
|
|
|
|
<title>Exercise:</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
There are also right outer joins and full outer joins. Try to
|
|
|
|
find out what those do.
|
|
|
|
</para>
|
|
|
|
</formalpara>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<indexterm><primary>join</primary><secondary>self</secondary></indexterm>
|
|
|
|
<indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
|
|
|
|
|
|
|
|
We can also join a table against itself. This is called a
|
|
|
|
<firstterm>self join</firstterm>. As an example, suppose we wish
|
|
|
|
to find all the weather records that are in the temperature range
|
|
|
|
of other weather records. So we need to compare the
|
|
|
|
<structfield>temp_lo</> and <structfield>temp_hi</> columns of
|
|
|
|
each <classname>weather</classname> row to the
|
|
|
|
<structfield>temp_lo</structfield> and
|
|
|
|
<structfield>temp_hi</structfield> columns of all other
|
|
|
|
<classname>weather</classname> rows. We can do this with the
|
|
|
|
following query:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
|
|
|
|
W2.city, W2.temp_lo AS low, W2.temp_hi AS high
|
|
|
|
FROM weather W1, weather W2
|
2005-01-22 23:56:36 +01:00
|
|
|
WHERE W1.temp_lo < W2.temp_lo
|
|
|
|
AND W1.temp_hi > W2.temp_hi;
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
city | low | high | city | low | high
|
|
|
|
---------------+-----+------+---------------+-----+------
|
|
|
|
San Francisco | 43 | 57 | San Francisco | 46 | 50
|
|
|
|
Hayward | 37 | 54 | San Francisco | 46 | 50
|
|
|
|
(2 rows)
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
Here we have relabeled the weather table as <literal>W1</> and
|
|
|
|
<literal>W2</> to be able to distinguish the left and right side
|
|
|
|
of the join. You can also use these kinds of aliases in other
|
|
|
|
queries to save some typing, e.g.:
|
|
|
|
<programlisting>
|
|
|
|
SELECT *
|
|
|
|
FROM weather w, cities c
|
|
|
|
WHERE w.city = c.name;
|
|
|
|
</programlisting>
|
|
|
|
You will encounter this style of abbreviating quite frequently.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<sect1 id="tutorial-agg">
|
|
|
|
<title>Aggregate Functions</title>
|
|
|
|
|
|
|
|
<indexterm zone="tutorial-agg">
|
2003-08-31 19:32:24 +02:00
|
|
|
<primary>aggregate function</primary>
|
2001-09-03 01:27:50 +02:00
|
|
|
</indexterm>
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm><primary>average</primary></indexterm>
|
|
|
|
<indexterm><primary>count</primary></indexterm>
|
|
|
|
<indexterm><primary>max</primary></indexterm>
|
|
|
|
<indexterm><primary>min</primary></indexterm>
|
|
|
|
<indexterm><primary>sum</primary></indexterm>
|
|
|
|
|
2000-04-07 15:30:58 +02:00
|
|
|
Like most other relational database products,
|
2000-02-15 04:57:02 +01:00
|
|
|
<productname>PostgreSQL</productname> supports
|
1999-05-20 07:39:29 +02:00
|
|
|
aggregate functions.
|
1999-12-13 18:39:38 +01:00
|
|
|
An aggregate function computes a single result from multiple input rows.
|
|
|
|
For example, there are aggregates to compute the
|
2000-02-15 04:57:02 +01:00
|
|
|
<function>count</function>, <function>sum</function>,
|
|
|
|
<function>avg</function> (average), <function>max</function> (maximum) and
|
2001-01-14 00:58:55 +01:00
|
|
|
<function>min</function> (minimum) over a set of rows.
|
1999-12-13 18:39:38 +01:00
|
|
|
</para>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
1999-12-13 18:39:38 +01:00
|
|
|
As an example, we can find the highest low-temperature reading anywhere
|
|
|
|
with
|
1999-05-20 07:39:29 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT max(temp_lo) FROM weather;
|
2001-09-03 01:27:50 +02:00
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
max
|
|
|
|
-----
|
|
|
|
46
|
|
|
|
(1 row)
|
|
|
|
</screen>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<indexterm><primary>subquery</primary></indexterm>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2002-11-11 21:14:04 +01:00
|
|
|
If we wanted to know what city (or cities) that reading occurred in,
|
1999-12-13 18:39:38 +01:00
|
|
|
we might try
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
|
|
|
SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
|
|
|
|
</programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-04-07 15:30:58 +02:00
|
|
|
but this will not work since the aggregate
|
2001-09-03 01:27:50 +02:00
|
|
|
<function>max</function> cannot be used in the
|
2001-11-19 06:37:53 +01:00
|
|
|
<literal>WHERE</literal> clause. (This restriction exists because
|
2006-10-22 01:12:57 +02:00
|
|
|
the <literal>WHERE</literal> clause determines which rows will be
|
|
|
|
included in the aggregate calculation; so obviously it has to be evaluated
|
|
|
|
before aggregate functions are computed.)
|
2001-11-19 06:37:53 +01:00
|
|
|
However, as is often the case
|
2006-10-22 01:12:57 +02:00
|
|
|
the query can be restated to accomplish the desired result, here
|
2001-09-03 01:27:50 +02:00
|
|
|
by using a <firstterm>subquery</firstterm>:
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
2000-04-07 15:30:58 +02:00
|
|
|
SELECT city FROM weather
|
|
|
|
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
|
2001-09-03 01:27:50 +02:00
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
city
|
|
|
|
---------------
|
|
|
|
San Francisco
|
|
|
|
(1 row)
|
|
|
|
</screen>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2002-11-11 21:14:04 +01:00
|
|
|
This is OK because the subquery is an independent computation
|
2001-09-03 01:27:50 +02:00
|
|
|
that computes its own aggregate separately from what is happening
|
2002-11-11 21:14:04 +01:00
|
|
|
in the outer query.
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1998-03-01 09:16:16 +01:00
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
<para>
|
2001-09-03 01:27:50 +02:00
|
|
|
<indexterm><primary>GROUP BY</primary></indexterm>
|
|
|
|
<indexterm><primary>HAVING</primary></indexterm>
|
|
|
|
|
|
|
|
Aggregates are also very useful in combination with <literal>GROUP
|
|
|
|
BY</literal> clauses. For example, we can get the maximum low
|
|
|
|
temperature observed in each city with
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
1998-03-01 09:16:16 +01:00
|
|
|
SELECT city, max(temp_lo)
|
|
|
|
FROM weather
|
|
|
|
GROUP BY city;
|
2001-09-03 01:27:50 +02:00
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
city | max
|
|
|
|
---------------+-----
|
|
|
|
Hayward | 37
|
|
|
|
San Francisco | 46
|
|
|
|
(2 rows)
|
|
|
|
</screen>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2001-11-23 22:08:51 +01:00
|
|
|
which gives us one output row per city. Each aggregate result is
|
|
|
|
computed over the table rows matching that city.
|
|
|
|
We can filter these grouped
|
2001-09-03 01:27:50 +02:00
|
|
|
rows using <literal>HAVING</literal>:
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
1999-12-13 18:39:38 +01:00
|
|
|
SELECT city, max(temp_lo)
|
|
|
|
FROM weather
|
|
|
|
GROUP BY city
|
2005-01-22 23:56:36 +01:00
|
|
|
HAVING max(temp_lo) < 40;
|
2001-09-03 01:27:50 +02:00
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
city | max
|
|
|
|
---------+-----
|
|
|
|
Hayward | 37
|
|
|
|
(1 row)
|
|
|
|
</screen>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2001-11-23 22:08:51 +01:00
|
|
|
which gives us the same results for only the cities that have all
|
2001-11-28 21:49:10 +01:00
|
|
|
<literal>temp_lo</> values below 40. Finally, if we only care about
|
2001-11-23 22:08:51 +01:00
|
|
|
cities whose
|
2001-09-03 01:27:50 +02:00
|
|
|
names begin with <quote><literal>S</literal></quote>, we might do
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
1999-12-13 18:39:38 +01:00
|
|
|
SELECT city, max(temp_lo)
|
|
|
|
FROM weather
|
2002-11-11 21:14:04 +01:00
|
|
|
WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
|
1999-12-13 18:39:38 +01:00
|
|
|
GROUP BY city
|
2005-01-22 23:56:36 +01:00
|
|
|
HAVING max(temp_lo) < 40;
|
2001-09-03 01:27:50 +02:00
|
|
|
</programlisting>
|
2002-11-11 21:14:04 +01:00
|
|
|
<calloutlist>
|
|
|
|
<callout arearefs="co.tutorial-agg-like">
|
|
|
|
<para>
|
|
|
|
The <literal>LIKE</literal> operator does pattern matching and
|
2003-03-25 17:15:44 +01:00
|
|
|
is explained in <xref linkend="functions-matching">.
|
2002-11-11 21:14:04 +01:00
|
|
|
</para>
|
|
|
|
</callout>
|
|
|
|
</calloutlist>
|
2001-09-03 01:27:50 +02:00
|
|
|
</para>
|
2000-04-07 15:30:58 +02:00
|
|
|
|
2001-09-03 01:27:50 +02:00
|
|
|
<para>
|
|
|
|
It is important to understand the interaction between aggregates and
|
2002-01-20 23:19:57 +01:00
|
|
|
<acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
|
2001-09-03 01:27:50 +02:00
|
|
|
The fundamental difference between <literal>WHERE</literal> and
|
|
|
|
<literal>HAVING</literal> is this: <literal>WHERE</literal> selects
|
|
|
|
input rows before groups and aggregates are computed (thus, it controls
|
|
|
|
which rows go into the aggregate computation), whereas
|
|
|
|
<literal>HAVING</literal> selects group rows after groups and
|
|
|
|
aggregates are computed. Thus, the
|
|
|
|
<literal>WHERE</literal> clause must not contain aggregate functions;
|
|
|
|
it makes no sense to try to use an aggregate to determine which rows
|
2004-12-17 05:50:32 +01:00
|
|
|
will be inputs to the aggregates. On the other hand, the
|
2003-03-13 02:30:29 +01:00
|
|
|
<literal>HAVING</literal> clause always contains aggregate functions.
|
2001-09-03 01:27:50 +02:00
|
|
|
(Strictly speaking, you are allowed to write a <literal>HAVING</literal>
|
2005-03-11 00:21:26 +01:00
|
|
|
clause that doesn't use aggregates, but it's seldom useful. The same
|
|
|
|
condition could be used more efficiently at the <literal>WHERE</literal>
|
|
|
|
stage.)
|
2001-09-03 01:27:50 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2004-12-17 05:50:32 +01:00
|
|
|
In the previous example, we can apply the city name restriction in
|
2001-09-03 01:27:50 +02:00
|
|
|
<literal>WHERE</literal>, since it needs no aggregate. This is
|
|
|
|
more efficient than adding the restriction to <literal>HAVING</literal>,
|
1999-12-13 18:39:38 +01:00
|
|
|
because we avoid doing the grouping and aggregate calculations
|
2001-09-03 01:27:50 +02:00
|
|
|
for all rows that fail the <literal>WHERE</literal> check.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
|
|
<sect1 id="tutorial-update">
|
|
|
|
<title>Updates</title>
|
|
|
|
|
|
|
|
<indexterm zone="tutorial-update">
|
|
|
|
<primary>UPDATE</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
You can update existing rows using the
|
|
|
|
<command>UPDATE</command> command.
|
|
|
|
Suppose you discover the temperature readings are
|
2006-10-22 01:12:57 +02:00
|
|
|
all off by 2 degrees after November 28. You may correct the
|
2001-11-19 06:37:53 +01:00
|
|
|
data as follows:
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
UPDATE weather
|
|
|
|
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
|
2005-01-22 23:56:36 +01:00
|
|
|
WHERE date > '1994-11-28';
|
2001-09-03 01:27:50 +02:00
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Look at the new state of the data:
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM weather;
|
|
|
|
|
|
|
|
city | temp_lo | temp_hi | prcp | date
|
|
|
|
---------------+---------+---------+------+------------
|
|
|
|
San Francisco | 46 | 50 | 0.25 | 1994-11-27
|
|
|
|
San Francisco | 41 | 55 | 0 | 1994-11-29
|
|
|
|
Hayward | 35 | 52 | | 1994-11-29
|
|
|
|
(3 rows)
|
|
|
|
</programlisting>
|
2000-02-15 04:57:02 +01:00
|
|
|
</para>
|
1999-05-20 07:39:29 +02:00
|
|
|
</sect1>
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<sect1 id="tutorial-delete">
|
|
|
|
<title>Deletions</title>
|
|
|
|
|
|
|
|
<indexterm zone="tutorial-delete">
|
|
|
|
<primary>DELETE</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
2004-12-17 05:50:32 +01:00
|
|
|
Rows can be removed from a table using the <command>DELETE</command>
|
|
|
|
command.
|
2002-11-11 21:14:04 +01:00
|
|
|
Suppose you are no longer interested in the weather of Hayward.
|
2004-12-17 05:50:32 +01:00
|
|
|
Then you can do the following to delete those rows from the table:
|
2001-09-03 01:27:50 +02:00
|
|
|
<programlisting>
|
|
|
|
DELETE FROM weather WHERE city = 'Hayward';
|
|
|
|
</programlisting>
|
|
|
|
|
2001-11-19 06:37:53 +01:00
|
|
|
All weather records belonging to Hayward are removed.
|
2001-09-03 01:27:50 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM weather;
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
city | temp_lo | temp_hi | prcp | date
|
|
|
|
---------------+---------+---------+------+------------
|
|
|
|
San Francisco | 46 | 50 | 0.25 | 1994-11-27
|
|
|
|
San Francisco | 41 | 55 | 0 | 1994-11-29
|
|
|
|
(2 rows)
|
|
|
|
</screen>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-11-11 21:14:04 +01:00
|
|
|
One should be wary of statements of the form
|
2001-09-03 01:27:50 +02:00
|
|
|
<synopsis>
|
|
|
|
DELETE FROM <replaceable>tablename</replaceable>;
|
|
|
|
</synopsis>
|
|
|
|
|
2001-11-19 06:37:53 +01:00
|
|
|
Without a qualification, <command>DELETE</command> will
|
|
|
|
remove <emphasis>all</> rows from the given table, leaving it
|
2001-09-03 01:27:50 +02:00
|
|
|
empty. The system will not request confirmation before
|
2001-11-19 06:37:53 +01:00
|
|
|
doing this!
|
2001-09-03 01:27:50 +02:00
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
2000-02-15 04:57:02 +01:00
|
|
|
</chapter>
|