postgresql/doc/src/sgml/advanced.sgml

603 lines
21 KiB
Plaintext
Raw Normal View History

<!--
2004-03-31 00:08:50 +02:00
$PostgreSQL: pgsql/doc/src/sgml/advanced.sgml,v 1.40 2004/03/30 22:08:50 momjian Exp $
-->
2001-09-03 01:27:50 +02:00
<chapter id="tutorial-advanced">
<title>Advanced Features</title>
<sect1 id="tutorial-advanced-intro">
<title>Introduction</title>
<para>
In the previous chapter we have covered the basics of using
<acronym>SQL</acronym> to store and access your data in
2001-09-03 01:27:50 +02:00
<productname>PostgreSQL</productname>. We will now discuss some
more advanced features of <acronym>SQL</acronym> that simplify
2001-09-03 01:27:50 +02:00
management and prevent loss or corruption of your data. Finally,
we will look at some <productname>PostgreSQL</productname>
extensions.
</para>
<para>
This chapter will on occasion refer to examples found in <xref
linkend="tutorial-sql"> to change or improve them, so it will be
of advantage if you have read that chapter. Some examples from
this chapter can also be found in
<filename>advanced.sql</filename> in the tutorial directory. This
file also contains some example data to load, which is not
repeated here. (Refer to <xref linkend="tutorial-sql-intro"> for
how to use the file.)
</para>
</sect1>
<sect1 id="tutorial-views">
<title>Views</title>
<indexterm zone="tutorial-views">
<primary>view</primary>
</indexterm>
<para>
Refer back to the queries in <xref linkend="tutorial-join">.
Suppose the combined listing of weather records and city location
is of particular interest to your application, but you do not want
2001-09-03 01:27:50 +02:00
to type the query each time you need it. You can create a
<firstterm>view</firstterm> over the query, which gives a name to
the query that you can refer to like an ordinary table.
<programlisting>
2001-09-03 01:27:50 +02:00
CREATE VIEW myview AS
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;
SELECT * FROM myview;
</programlisting>
2001-09-03 01:27:50 +02:00
</para>
<para>
Making liberal use of views is a key aspect of good SQL database
design. Views allow you to encapsulate the details of the
structure of your tables, which may change as your application
evolves, behind consistent interfaces.
</para>
<para>
Views can be used in almost any place a real table can be used.
Building views upon other views is not uncommon. You may cut down
on the difficulty of building complex queries by constructing them
in smaller, easier-to-verify pieces, using views. Views may be
used to reveal specific table columns to users that legitimately
need access to some of the data, but who shouldn't be able to look
at the whole table.
2001-09-03 01:27:50 +02:00
</para>
<para>
Views differ from <quote> real tables </quote> in that they are
not, by default, updatable. If they join together several tables,
it may be troublesome to update certain columns since the
<emphasis>real</emphasis> update that must take place requires
identifying the relevant rows in the source tables. This is
discussed further in <xref linkend="rules-views-update">.
</para>
</sect1>
2001-09-03 01:27:50 +02:00
<sect1 id="tutorial-fk">
<title>Foreign Keys</title>
<indexterm zone="tutorial-fk">
<primary>foreign key</primary>
</indexterm>
<indexterm zone="tutorial-fk">
<primary>referential integrity</primary>
</indexterm>
<para>
Recall the <classname>weather</classname> and
2001-09-03 01:27:50 +02:00
<classname>cities</classname> tables from <xref
linkend="tutorial-sql">. Consider the following problem: You
want to make sure that no one can insert rows in the
<classname>weather</classname> table that do not have a matching
entry in the <classname>cities</classname> table. This is called
maintaining the <firstterm>referential integrity</firstterm> of
your data. In simplistic database systems this would be
implemented (if at all) by first looking at the
<classname>cities</classname> table to check if a matching record
exists, and then inserting or rejecting the new
<classname>weather</classname> records. This approach has a
number of problems and is very inconvenient, so
<productname>PostgreSQL</productname> can do this for you.
</para>
<para>
The new declaration of the tables would look like this:
<programlisting>
2001-09-03 01:27:50 +02:00
CREATE TABLE cities (
city varchar(80) primary key,
location point
2001-09-03 01:27:50 +02:00
);
CREATE TABLE weather (
city varchar(80) references cities,
temp_lo int,
temp_hi int,
prcp real,
date date
2001-09-03 01:27:50 +02:00
);
</programlisting>
2001-09-03 01:27:50 +02:00
Now try inserting an invalid record:
<programlisting>
2001-09-03 01:27:50 +02:00
INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28');
</programlisting>
2001-09-03 01:27:50 +02:00
<screen>
ERROR: insert or update on table "weather" violates foreign key constraint "$1"
DETAIL: Key (city)=(Berkeley) is not present in table "cities".
</screen>
2001-09-03 01:27:50 +02:00
</para>
<para>
The behavior of foreign keys can be finely tuned to your
application. We will not go beyond this simple example in this
tutorial, but just refer you to <xref linkend="ddl">
for more information. Making correct use of
2001-09-03 01:27:50 +02:00
foreign keys will definitely improve the quality of your database
applications, so you are strongly encouraged to learn about them.
</para>
</sect1>
<sect1 id="tutorial-transactions">
<title>Transactions</title>
<indexterm zone="tutorial-transactions">
2003-08-31 19:32:24 +02:00
<primary>transaction</primary>
</indexterm>
<para>
<firstterm>Transactions</> are a fundamental concept of all database
systems. The essential point of a transaction is that it bundles
multiple steps into a single, all-or-nothing operation. The intermediate
states between the steps are not visible to other concurrent transactions,
and if some failure occurs that prevents the transaction from completing,
then none of the steps affect the database at all.
</para>
<para>
For example, consider a bank database that contains balances for various
customer accounts, as well as total deposit balances for branches.
Suppose that we want to record a payment of $100.00 from Alice's account
to Bob's account. Simplifying outrageously, the SQL commands for this
might look like
<programlisting>
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
UPDATE branches SET balance = balance - 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice');
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
UPDATE branches SET balance = balance + 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
</programlisting>
</para>
<para>
The details of these commands are not important here; the important
point is that there are several separate updates involved to accomplish
this rather simple operation. Our bank's officers will want to be
assured that either all these updates happen, or none of them happen.
It would certainly not do for a system failure to result in Bob
receiving $100.00 that was not debited from Alice. Nor would Alice long
remain a happy customer if she was debited without Bob being credited.
We need a guarantee that if something goes wrong partway through the
operation, none of the steps executed so far will take effect. Grouping
the updates into a <firstterm>transaction</> gives us this guarantee.
A transaction is said to be <firstterm>atomic</>: from the point of
view of other transactions, it either happens completely or not at all.
</para>
<para>
We also want a
guarantee that once a transaction is completed and acknowledged by
the database system, it has indeed been permanently recorded
and won't be lost even if a crash ensues shortly thereafter.
For example, if we are recording a cash withdrawal by Bob,
we do not want any chance that the debit to his account will
disappear in a crash just as he walks out the bank door.
A transactional database guarantees that all the updates made by
a transaction are logged in permanent storage (i.e., on disk) before
the transaction is reported complete.
</para>
<para>
Another important property of transactional databases is closely
related to the notion of atomic updates: when multiple transactions
are running concurrently, each one should not be able to see the
incomplete changes made by others. For example, if one transaction
is busy totalling all the branch balances, it would not do for it
to include the debit from Alice's branch but not the credit to
Bob's branch, nor vice versa. So transactions must be all-or-nothing
not only in terms of their permanent effect on the database, but
also in terms of their visibility as they happen. The updates made
so far by an open transaction are invisible to other transactions
until the transaction completes, whereupon all the updates become
visible simultaneously.
</para>
2001-09-03 01:27:50 +02:00
<para>
In <productname>PostgreSQL</>, a transaction is set up by surrounding
the SQL commands of the transaction with
<command>BEGIN</> and <command>COMMIT</> commands. So our banking
transaction would actually look like
<programlisting>
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
-- etc etc
COMMIT;
</programlisting>
</para>
<para>
If, partway through the transaction, we decide we do not want to
commit (perhaps we just noticed that Alice's balance went negative),
we can issue the command <command>ROLLBACK</> instead of
<command>COMMIT</>, and all our updates so far will be canceled.
</para>
2001-09-03 01:27:50 +02:00
<para>
<productname>PostgreSQL</> actually treats every SQL statement as being
executed within a transaction. If you do not issue a <command>BEGIN</>
command,
then each individual statement has an implicit <command>BEGIN</> and
(if successful) <command>COMMIT</> wrapped around it. A group of
statements surrounded by <command>BEGIN</> and <command>COMMIT</>
is sometimes called a <firstterm>transaction block</>.
2001-09-03 01:27:50 +02:00
</para>
<note>
<para>
Some client libraries issue <command>BEGIN</> and <command>COMMIT</>
commands automatically, so that you may get the effect of transaction
blocks without asking. Check the documentation for the interface
you are using.
</para>
</note>
2001-09-03 01:27:50 +02:00
</sect1>
<sect1 id="tutorial-inheritance">
<title>Inheritance</title>
2001-09-03 01:27:50 +02:00
<indexterm zone="tutorial-inheritance">
<primary>inheritance</primary>
</indexterm>
<para>
Inheritance is a concept from object-oriented databases. It opens
up interesting new possibilities of database design.
</para>
<para>
Let's create two tables: A table <classname>cities</classname>
and a table <classname>capitals</classname>. Naturally, capitals
are also cities, so you want some way to show the capitals
implicitly when you list all cities. If you're really clever you
might invent some scheme like this:
<programlisting>
2001-09-03 01:27:50 +02:00
CREATE TABLE capitals (
name text,
population real,
altitude int, -- (in ft)
state char(2)
2001-09-03 01:27:50 +02:00
);
CREATE TABLE non_capitals (
name text,
population real,
altitude int -- (in ft)
2001-09-03 01:27:50 +02:00
);
CREATE VIEW cities AS
SELECT name, population, altitude FROM capitals
UNION
SELECT name, population, altitude FROM non_capitals;
</programlisting>
2001-09-03 01:27:50 +02:00
This works OK as far as querying goes, but it gets ugly when you
need to update several rows, to name one thing.
</para>
<para>
2001-09-03 01:27:50 +02:00
A better solution is this:
<programlisting>
1998-03-01 09:16:16 +01:00
CREATE TABLE cities (
name text,
population real,
altitude int -- (in ft)
1998-03-01 09:16:16 +01:00
);
CREATE TABLE capitals (
state char(2)
) INHERITS (cities);
</programlisting>
</para>
2001-09-03 01:27:50 +02:00
<para>
2001-09-03 01:27:50 +02:00
In this case, a row of <classname>capitals</classname>
<firstterm>inherits</firstterm> all columns (<structfield>name</>,
<structfield>population</>, and <structfield>altitude</>) from its
<firstterm>parent</firstterm>, <classname>cities</classname>. The
type of the column <structfield>name</structfield> is
<type>text</type>, a native <productname>PostgreSQL</productname>
2001-09-03 01:27:50 +02:00
type for variable length character strings. State capitals have
an extra column, state, that shows their state. In
<productname>PostgreSQL</productname>, a table can inherit from
zero or more other tables.
</para>
<para>
For example, the following query finds the names of all cities,
including state capitals, that are located at an altitude
2001-09-03 01:27:50 +02:00
over 500 ft.:
<programlisting>
SELECT name, altitude
FROM cities
WHERE altitude &gt; 500;
</programlisting>
which returns:
1998-03-01 09:16:16 +01:00
<screen>
2001-09-03 01:27:50 +02:00
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)
</screen>
</para>
1998-03-01 09:16:16 +01:00
<para>
On the other hand, the following query finds
all the cities that are not state capitals and
2001-09-03 01:27:50 +02:00
are situated at an altitude of 500 ft. or higher:
1998-03-01 09:16:16 +01:00
<programlisting>
SELECT name, altitude
FROM ONLY cities
WHERE altitude &gt; 500;
</programlisting>
1998-03-01 09:16:16 +01:00
2001-09-03 01:27:50 +02:00
<screen>
name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
(2 rows)
</screen>
</para>
<para>
2001-09-03 01:27:50 +02:00
Here the <literal>ONLY</literal> before <literal>cities</literal>
indicates that the query should be run over only the
<classname>cities</classname> table, and not tables below
<classname>cities</classname> in the inheritance hierarchy. Many
of the commands that we have already discussed --
<command>SELECT</command>, <command>UPDATE</command>, and
2001-09-03 01:27:50 +02:00
<command>DELETE</command> -- support this <literal>ONLY</literal>
notation.
</para>
</sect1>
<sect1 id="tutorial-storedprocs">
<title> Stored Procedures </title>
<indexterm zone="tutorial-storedprocs">
<primary>stored procedures</primary>
</indexterm>
<para> Stored procedures are code that runs inside the database
system. Numerous languages may be used to implement functions and
procedures; most built-in code is implemented in C. The
<quote>basic</quote> loadable procedural language for
2004-03-31 00:08:50 +02:00
<productname>PostgreSQL</productname> is <xref linkend="plpgsql">.
Numerous other languages may also be used, including <xref
2004-03-31 00:08:50 +02:00
linkend="plperl">, <xref linkend="pltcl">, and <xref
linkend="plpython">.
</para>
<para> There are several ways that stored procedures are really
helpful:
<itemizedlist>
<listitem><para> To centralize data validation code into the
database </para>
<para> Your system may use client software written in several
languages, perhaps with a <quote>web application</quote>
implemented in PHP, a <quote>server application</quote> implemented
in Java, and a <quote> report writer</quote> implemented in Perl.
In the absence of stored procedures, you will likely find that data
validation code must be implemented multiple times, in multiple
languages, once for each application.</para>
<para> By implementing data validation in stored procedures,
running in the database, it can behave uniformly for all these
systems, and you do not need to worry about synchronizing
validation procedures across the languages.</para>
</listitem>
<listitem><para> Reducing round trips between client and server
</para>
<para>A stored procedure may submit multiple queries, looking up
information and adding in links to additional tables. This takes
place without requiring that the client submit multiple queries,
and without requiring any added network traffic.
</para>
<para> As a matter of course, the queries share a single
transaction context, and there may also be savings in the
evaluation of query plans, that will be similar between invocations
of a given stored procedure. </para></listitem>
<listitem><para> To simplify queries. </para>
<para> For instance, if you are commonly checking the TLD on domain
names, you might create a stored procedure for this purpose, and so
be able to use queries such as <command> select domain, tld(domain)
from domains; </command> instead of having to put verbose code
using <function>substr()</function> into each query.
</para>
<para> It is particularly convenient to use scripting languages
like Perl, Tcl, and Python to <quote>grovel through strings</quote>
since they are designed for <quote>text processing.</quote></para>
<para> The binding to the R statistical language allows
implementing complex statistical queries inside the database,
instead of having to draw the data out.
</listitem>
<listitem><para> Increasing the level of abstraction</para>
<para> If data is accessed exclusively through stored procedures,
then the structures of tables may be changed without there needing
to be any visible change in the API used by programmers. In some
systems, users are <emphasis>only</emphasis> allowed access to
stored procedures to update data, and cannot do direct updates to
tables.
</para>
</listitem>
</itemizedlist>
</para>
<para> These benefits build on one another: careful use of stored
procedures can simultaneously improve reliability and performance,
whilst simplifying database access code and improving portability
across client platforms and languages. For instance, consider that
a stored procedure can cheaply query tables in the database to
validate the correctness of data provided as input. </para>
<para> Instead of requiring a whole series of queries to create an
object, and to look up parent/subsidiary objects to link it to, a
stored procedure can do all of this efficiently in the database
server, improving performance, and eliminating whole classes of
errors. </para>
</sect1>
<sect1 id="tutorial-triggers">
<title> Triggers </title>
<indexterm zone="tutorial-triggers">
<primary>triggers</primary>
</indexterm>
<para> Triggers allow running a function either before or after
update (<command>INSERT</command>, <command>DELETE</command>,
<command>UPDATE</command>) operations, which can allow you to do
some very clever things. </para>
<itemizedlist>
<listitem><para> Data Validation </para>
<para> Instead of explicitly coding validation checks as part of a
stored procedure, they may be introduced as <command>BEFORE</command>
triggers. The trigger function checks the input values, raising an
exception if it finds invalid input.</para>
<para> Note that this is how foreign key checks are implemented in
<productname>PostgreSQL</productname>; when you define a foreign
key, you will see a message similar to the following:
<screen>
NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
</screen></para>
<para> In some cases, it may be appropriate for a trigger function
to insert data in order to <emphasis>make</emphasis> the input valid. For
instance, if a newly created object needs a status code in a status
table, the trigger might automatically do that.</para>
</listitem>
<listitem><para> Audit logs </para>
<para> One may use <command>AFTER</command> triggers to monitor updates to
vital tables, and <command>INSERT</command> entries into log tables to
provide a more permanent record of those updates. </para>
</listitem>
<listitem><para> Replication </para>
<para> The <application>RServ</application> replication system uses
<command>AFTER</command> triggers to track which rows have changed on the
<quote>master</quote> system and therefore need to be copied over to
<quote>slave</quote> systems.</para>
<para> <command>
CREATE TRIGGER "_rserv_trigger_t_" AFTER INSERT OR DELETE OR UPDATE ON "my_table"
FOR EACH ROW EXECUTE PROCEDURE "_rserv_log_" ('10');
</command></para>
</listitem>
</itemizedlist>
<para> Notice that there are strong parallels between what can be
accomplished using triggers and stored procedures, particularly in
regards to data validation. </para>
</sect1>
2001-09-03 01:27:50 +02:00
<sect1 id="tutorial-conclusion">
<title>Conclusion</title>
<para>
2001-09-03 01:27:50 +02:00
<productname>PostgreSQL</productname> has many features not
touched upon in this tutorial introduction, which has been
oriented toward newer users of <acronym>SQL</acronym>. These
features are discussed in more detail in the remainder of this
book.
</para>
<para>
2001-09-03 01:27:50 +02:00
If you feel you need more introductory material, please visit the
<ulink url="http://www.postgresql.org">PostgreSQL web
site</ulink> for links to more resources.
</para>
</sect1>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->