2001-11-18 23:27:00 +01:00
|
|
|
<!--
|
2004-08-08 06:34:43 +02:00
|
|
|
$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.46 2004/08/08 04:34:43 tgl Exp $
|
2001-11-18 23:27:00 +01:00
|
|
|
-->
|
2000-06-30 18:14:21 +02:00
|
|
|
<chapter id="backup">
|
|
|
|
<title>Backup and Restore</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="backup"><primary>backup</></>
|
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
<para>
|
2001-11-21 06:53:41 +01:00
|
|
|
As everything that contains valuable data, <productname>PostgreSQL</>
|
2000-06-30 18:14:21 +02:00
|
|
|
databases should be backed up regularly. While the procedure is
|
|
|
|
essentially simple, it is important to have a basic understanding of
|
|
|
|
the underlying techniques and assumptions.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2004-08-03 22:32:36 +02:00
|
|
|
There are three fundamentally different approaches to backing up
|
2001-11-21 06:53:41 +01:00
|
|
|
<productname>PostgreSQL</> data:
|
2000-06-30 18:14:21 +02:00
|
|
|
<itemizedlist>
|
|
|
|
<listitem><para><acronym>SQL</> dump</para></listitem>
|
|
|
|
<listitem><para>File system level backup</para></listitem>
|
2004-08-03 22:32:36 +02:00
|
|
|
<listitem><para>On-line backup</para></listitem>
|
2000-06-30 18:14:21 +02:00
|
|
|
</itemizedlist>
|
2004-08-03 22:32:36 +02:00
|
|
|
Each has its own strengths and weaknesses.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="backup-dump">
|
2000-06-30 18:14:21 +02:00
|
|
|
<title><acronym>SQL</> Dump</title>
|
|
|
|
|
|
|
|
<para>
|
2001-11-28 21:49:10 +01:00
|
|
|
The idea behind the SQL-dump method is to generate a text file with SQL
|
2000-06-30 18:14:21 +02:00
|
|
|
commands that, when fed back to the server, will recreate the
|
|
|
|
database in the same state as it was at the time of the dump.
|
2001-11-21 06:53:41 +01:00
|
|
|
<productname>PostgreSQL</> provides the utility program
|
2004-04-22 09:02:36 +02:00
|
|
|
<xref linkend="app-pgdump"> for this purpose. The basic usage of this
|
2000-06-30 18:14:21 +02:00
|
|
|
command is:
|
|
|
|
<synopsis>
|
|
|
|
pg_dump <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">outfile</replaceable>
|
|
|
|
</synopsis>
|
|
|
|
As you see, <application>pg_dump</> writes its results to the
|
|
|
|
standard output. We will see below how this can be useful.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-11-21 06:53:41 +01:00
|
|
|
<application>pg_dump</> is a regular <productname>PostgreSQL</>
|
2000-06-30 18:14:21 +02:00
|
|
|
client application (albeit a particularly clever one). This means
|
|
|
|
that you can do this backup procedure from any remote host that has
|
|
|
|
access to the database. But remember that <application>pg_dump</>
|
|
|
|
does not operate with special permissions. In particular, you must
|
|
|
|
have read access to all tables that you want to back up, so in
|
|
|
|
practice you almost always have to be a database superuser.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2000-12-10 21:47:34 +01:00
|
|
|
To specify which database server <application>pg_dump</> should
|
2000-06-30 18:14:21 +02:00
|
|
|
contact, use the command line options <option>-h
|
|
|
|
<replaceable>host</></> and <option>-p <replaceable>port</></>. The
|
|
|
|
default host is the local host or whatever your
|
|
|
|
<envar>PGHOST</envar> environment variable specifies. Similarly,
|
|
|
|
the default port is indicated by the <envar>PGPORT</envar>
|
|
|
|
environment variable or, failing that, by the compiled-in default.
|
|
|
|
(Conveniently, the server will normally have the same compiled-in
|
|
|
|
default.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-11-21 06:53:41 +01:00
|
|
|
As any other <productname>PostgreSQL</> client application,
|
2000-06-30 18:14:21 +02:00
|
|
|
<application>pg_dump</> will by default connect with the database
|
2002-11-11 21:14:04 +01:00
|
|
|
user name that is equal to the current operating system user name. To override
|
2001-05-17 23:12:49 +02:00
|
|
|
this, either specify the <option>-U</option> option or set the
|
|
|
|
environment variable <envar>PGUSER</envar>. Remember that
|
|
|
|
<application>pg_dump</> connections are subject to the normal
|
|
|
|
client authentication mechanisms (which are described in <xref
|
2000-06-30 18:14:21 +02:00
|
|
|
linkend="client-authentication">).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Dumps created by <application>pg_dump</> are internally consistent,
|
|
|
|
that is, updates to the database while <application>pg_dump</> is
|
|
|
|
running will not be in the dump. <application>pg_dump</> does not
|
|
|
|
block other operations on the database while it is working.
|
|
|
|
(Exceptions are those operations that need to operate with an
|
2001-11-18 23:17:30 +01:00
|
|
|
exclusive lock, such as <command>VACUUM FULL</command>.)
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<important>
|
|
|
|
<para>
|
2001-11-18 23:27:00 +01:00
|
|
|
When your database schema relies on OIDs (for instance as foreign
|
2000-06-30 18:14:21 +02:00
|
|
|
keys) you must instruct <application>pg_dump</> to dump the OIDs
|
|
|
|
as well. To do this, use the <option>-o</option> command line
|
2004-02-17 10:07:16 +01:00
|
|
|
option. <quote>Large objects</> are not dumped by default,
|
|
|
|
either. See <xref linkend="app-pgdump">'s reference page if you
|
|
|
|
use large objects.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
</important>
|
|
|
|
|
2001-03-19 17:19:26 +01:00
|
|
|
<sect2 id="backup-dump-restore">
|
2000-06-30 18:14:21 +02:00
|
|
|
<title>Restoring the dump</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The text files created by <application>pg_dump</> are intended to
|
|
|
|
be read in by the <application>psql</application> program. The
|
|
|
|
general command form to restore a dump is
|
|
|
|
<synopsis>
|
|
|
|
psql <replaceable class="parameter">dbname</replaceable> < <replaceable class="parameter">infile</replaceable>
|
|
|
|
</synopsis>
|
|
|
|
where <replaceable class="parameter">infile</replaceable> is what
|
|
|
|
you used as <replaceable class="parameter">outfile</replaceable>
|
2004-01-11 06:46:58 +01:00
|
|
|
for the <application>pg_dump</> command. The database <replaceable
|
2000-06-30 18:14:21 +02:00
|
|
|
class="parameter">dbname</replaceable> will not be created by this
|
2002-11-11 21:14:04 +01:00
|
|
|
command, you must create it yourself from <literal>template0</> before executing
|
2001-11-28 21:49:10 +01:00
|
|
|
<application>psql</> (e.g., with <literal>createdb -T template0
|
|
|
|
<replaceable class="parameter">dbname</></literal>).
|
2001-03-13 15:08:18 +01:00
|
|
|
<application>psql</> supports similar options to <application>pg_dump</>
|
2003-03-24 15:32:51 +01:00
|
|
|
for controlling the database server location and the user name. See
|
2000-06-30 18:14:21 +02:00
|
|
|
its reference page for more information.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If the objects in the original database were owned by different
|
|
|
|
users, then the dump will instruct <application>psql</> to connect
|
|
|
|
as each affected user in turn and then create the relevant
|
|
|
|
objects. This way the original ownership is preserved. This also
|
2001-11-18 23:27:00 +01:00
|
|
|
means, however, that all these users must already exist, and
|
2000-06-30 18:14:21 +02:00
|
|
|
furthermore that you must be allowed to connect as each of them.
|
|
|
|
It might therefore be necessary to temporarily relax the client
|
|
|
|
authentication settings.
|
|
|
|
</para>
|
|
|
|
|
2003-03-18 01:02:11 +01:00
|
|
|
<para>
|
2004-04-22 09:02:36 +02:00
|
|
|
Once restored, it is wise to run <xref linkend="sql-analyze"
|
|
|
|
endterm="sql-analyze-title"> on each database so the optimizer has
|
|
|
|
useful statistics. You can also run <command>vacuumdb -a -z</> to
|
|
|
|
<command>VACUUM ANALYZE</> all databases; this is equivalent to
|
|
|
|
running <command>VACUUM ANALYZE</command> manually.
|
2003-03-18 01:02:11 +01:00
|
|
|
</para>
|
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
<para>
|
|
|
|
The ability of <application>pg_dump</> and <application>psql</> to
|
2001-11-18 23:27:00 +01:00
|
|
|
write to or read from pipes makes it possible to dump a database
|
2003-03-24 15:32:51 +01:00
|
|
|
directly from one server to another; for example:
|
2000-06-30 18:14:21 +02:00
|
|
|
<programlisting>
|
|
|
|
pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>host2</> <replaceable>dbname</>
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
2001-03-13 15:08:18 +01:00
|
|
|
|
2002-11-11 21:14:04 +01:00
|
|
|
<important>
|
|
|
|
<para>
|
|
|
|
The dumps produced by <application>pg_dump</> are relative to
|
|
|
|
<literal>template0</>. This means that any languages, procedures,
|
|
|
|
etc. added to <literal>template1</> will also be dumped by
|
|
|
|
<application>pg_dump</>. As a result, when restoring, if you are
|
|
|
|
using a customized <literal>template1</>, you must create the
|
|
|
|
empty database from <literal>template0</>, as in the example
|
|
|
|
above.
|
|
|
|
</para>
|
|
|
|
</important>
|
2001-03-13 15:08:18 +01:00
|
|
|
|
2004-04-22 09:02:36 +02:00
|
|
|
<para>
|
|
|
|
For advice on how to load large amounts of data into
|
|
|
|
<productname>PostgreSQL</productname> efficiently, refer to <xref
|
|
|
|
linkend="populate">.
|
|
|
|
</para>
|
2000-06-30 18:14:21 +02:00
|
|
|
</sect2>
|
|
|
|
|
2001-03-19 17:19:26 +01:00
|
|
|
<sect2 id="backup-dump-all">
|
2004-01-11 06:46:58 +01:00
|
|
|
<title>Using <application>pg_dumpall</></title>
|
2000-06-30 18:14:21 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
The above mechanism is cumbersome and inappropriate when backing
|
2004-04-22 09:02:36 +02:00
|
|
|
up an entire database cluster. For this reason the <xref
|
|
|
|
linkend="app-pg-dumpall"> program is provided.
|
2000-06-30 18:14:21 +02:00
|
|
|
<application>pg_dumpall</> backs up each database in a given
|
2004-04-22 09:02:36 +02:00
|
|
|
cluster, and also preserves cluster-wide data such as users and
|
|
|
|
groups. The basic usage of this command is:
|
2000-06-30 18:14:21 +02:00
|
|
|
<synopsis>
|
|
|
|
pg_dumpall > <replaceable>outfile</>
|
|
|
|
</synopsis>
|
2003-08-01 03:01:52 +02:00
|
|
|
The resulting dump can be restored with <application>psql</>:
|
|
|
|
<synopsis>
|
|
|
|
psql template1 < <replaceable class="parameter">infile</replaceable>
|
|
|
|
</synopsis>
|
|
|
|
(Actually, you can specify any existing database name to start from,
|
|
|
|
but if you are reloading in an empty cluster then <literal>template1</>
|
|
|
|
is the only available choice.) It is always necessary to have
|
|
|
|
database superuser access when restoring a <application>pg_dumpall</>
|
|
|
|
dump, as that is required to restore the user and group information.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
2001-03-19 17:19:26 +01:00
|
|
|
<sect2 id="backup-dump-large">
|
2000-06-30 18:14:21 +02:00
|
|
|
<title>Large Databases</title>
|
|
|
|
|
|
|
|
<para>
|
2001-11-21 06:53:41 +01:00
|
|
|
Since <productname>PostgreSQL</productname> allows tables larger
|
2000-06-30 18:14:21 +02:00
|
|
|
than the maximum file size on your system, it can be problematic
|
2003-03-24 15:32:51 +01:00
|
|
|
to dump such a table to a file, since the resulting file will likely
|
2004-04-22 09:02:36 +02:00
|
|
|
be larger than the maximum size allowed by your system. Since
|
2003-03-24 15:32:51 +01:00
|
|
|
<application>pg_dump</> can write to the standard output, you can
|
|
|
|
just use standard Unix tools to work around this possible problem.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<formalpara>
|
|
|
|
<title>Use compressed dumps.</title>
|
|
|
|
<para>
|
2003-03-24 15:32:51 +01:00
|
|
|
You can use your favorite compression program, for example
|
2000-06-30 18:14:21 +02:00
|
|
|
<application>gzip</application>.
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
pg_dump <replaceable class="parameter">dbname</replaceable> | gzip > <replaceable class="parameter">filename</replaceable>.gz
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
Reload with
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
createdb <replaceable class="parameter">dbname</replaceable>
|
|
|
|
gunzip -c <replaceable class="parameter">filename</replaceable>.gz | psql <replaceable class="parameter">dbname</replaceable>
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
or
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
cat <replaceable class="parameter">filename</replaceable>.gz | gunzip | psql <replaceable class="parameter">dbname</replaceable>
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</formalpara>
|
|
|
|
|
|
|
|
<formalpara>
|
2003-03-24 15:32:51 +01:00
|
|
|
<title>Use <command>split</>.</title>
|
2000-06-30 18:14:21 +02:00
|
|
|
<para>
|
2003-03-24 15:32:51 +01:00
|
|
|
The <command>split</command> command
|
|
|
|
allows you to split the output into pieces that are
|
2000-06-30 18:14:21 +02:00
|
|
|
acceptable in size to the underlying file system. For example, to
|
|
|
|
make chunks of 1 megabyte:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
pg_dump <replaceable class="parameter">dbname</replaceable> | split -b 1m - <replaceable class="parameter">filename</replaceable>
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
Reload with
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
createdb <replaceable class="parameter">dbname</replaceable>
|
2001-09-10 09:11:28 +02:00
|
|
|
cat <replaceable class="parameter">filename</replaceable>* | psql <replaceable class="parameter">dbname</replaceable>
|
2000-06-30 18:14:21 +02:00
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</formalpara>
|
|
|
|
|
2001-03-13 15:08:18 +01:00
|
|
|
<formalpara>
|
2001-10-01 19:46:46 +02:00
|
|
|
<title>Use the custom dump format.</title>
|
2001-03-13 15:08:18 +01:00
|
|
|
<para>
|
2001-11-21 06:53:41 +01:00
|
|
|
If <productname>PostgreSQL</productname> was built on a system with the <application>zlib</> compression library
|
2001-03-13 15:08:18 +01:00
|
|
|
installed, the custom dump format will compress data as it writes it
|
|
|
|
to the output file. For large databases, this will produce similar dump
|
2001-09-10 01:52:12 +02:00
|
|
|
sizes to using <command>gzip</command>, but has the added advantage that the tables can be
|
2001-03-13 15:08:18 +01:00
|
|
|
restored selectively. The following command dumps a database using the
|
|
|
|
custom dump format:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
pg_dump -Fc <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">filename</replaceable>
|
|
|
|
</programlisting>
|
|
|
|
|
2004-02-17 10:07:16 +01:00
|
|
|
See the <xref linkend="app-pgdump"> and <xref
|
|
|
|
linkend="app-pgrestore"> reference pages for details.
|
2001-03-13 15:08:18 +01:00
|
|
|
</para>
|
|
|
|
</formalpara>
|
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
</sect2>
|
|
|
|
|
2001-03-19 17:19:26 +01:00
|
|
|
<sect2 id="backup-dump-caveats">
|
2000-06-30 18:14:21 +02:00
|
|
|
<title>Caveats</title>
|
|
|
|
|
|
|
|
<para>
|
2003-08-31 19:32:24 +02:00
|
|
|
For reasons of backward compatibility, <application>pg_dump</>
|
|
|
|
does not dump large objects by default.<indexterm><primary>large
|
|
|
|
object</primary><secondary>backup</secondary></indexterm> To dump
|
2004-04-22 09:02:36 +02:00
|
|
|
large objects you must use either the custom or the tar output
|
2003-08-31 19:32:24 +02:00
|
|
|
format, and use the <option>-b</> option in
|
|
|
|
<application>pg_dump</>. See the reference pages for details. The
|
|
|
|
directory <filename>contrib/pg_dumplo</> of the
|
|
|
|
<productname>PostgreSQL</> source tree also contains a program
|
|
|
|
that can dump large objects.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2004-02-17 10:07:16 +01:00
|
|
|
Please familiarize yourself with the <xref linkend="app-pgdump">
|
|
|
|
reference page.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="backup-file">
|
2000-06-30 18:14:21 +02:00
|
|
|
<title>File system level backup</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
An alternative backup strategy is to directly copy the files that
|
2001-11-21 06:53:41 +01:00
|
|
|
<productname>PostgreSQL</> uses to store the data in the database. In
|
2000-06-30 18:14:21 +02:00
|
|
|
<xref linkend="creating-cluster"> it is explained where these files
|
|
|
|
are located, but you have probably found them already if you are
|
|
|
|
interested in this method. You can use whatever method you prefer
|
|
|
|
for doing usual file system backups, for example
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
<programlisting>
|
|
|
|
tar -cf backup.tar /usr/local/pgsql/data
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
There are two restrictions, however, which make this method
|
|
|
|
impractical, or at least inferior to the <application>pg_dump</>
|
|
|
|
method:
|
|
|
|
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The database server <emphasis>must</> be shut down in order to
|
|
|
|
get a usable backup. Half-way measures such as disallowing all
|
2004-04-22 09:02:36 +02:00
|
|
|
connections will <emphasis>not</emphasis> work
|
2004-08-03 22:32:36 +02:00
|
|
|
(mainly because <command>tar</command> and similar tools do not take an
|
|
|
|
atomic snapshot of the state of the filesystem at a point in
|
2004-04-22 09:02:36 +02:00
|
|
|
time). Information about stopping the server can be found in
|
|
|
|
<xref linkend="postmaster-shutdown">. Needless to say that you
|
|
|
|
also need to shut down the server before restoring the data.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2004-01-19 21:12:30 +01:00
|
|
|
If you have dug into the details of the file system layout of the
|
|
|
|
database, you may be tempted to try to back up or restore only certain
|
2000-06-30 18:14:21 +02:00
|
|
|
individual tables or databases from their respective files or
|
|
|
|
directories. This will <emphasis>not</> work because the
|
|
|
|
information contained in these files contains only half the
|
2001-08-25 20:52:43 +02:00
|
|
|
truth. The other half is in the commit log files
|
|
|
|
<filename>pg_clog/*</filename>, which contain the commit status of
|
2000-06-30 18:14:21 +02:00
|
|
|
all transactions. A table file is only usable with this
|
|
|
|
information. Of course it is also impossible to restore only a
|
2001-08-25 20:52:43 +02:00
|
|
|
table and the associated <filename>pg_clog</filename> data
|
2003-03-24 15:32:51 +01:00
|
|
|
because that would render all other tables in the database
|
2004-08-03 22:32:36 +02:00
|
|
|
cluster useless. So file system backups only work for complete
|
|
|
|
restoration of an entire database cluster.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-11-04 10:55:39 +01:00
|
|
|
An alternative file-system backup approach is to make a
|
|
|
|
<quote>consistent snapshot</quote> of the data directory, if the
|
2004-01-19 21:12:30 +01:00
|
|
|
file system supports that functionality (and you are willing to
|
|
|
|
trust that it is implemented correctly). The typical procedure is
|
|
|
|
to make a <quote>frozen snapshot</> of the volume containing the
|
|
|
|
database, then copy the whole data directory (not just parts, see
|
|
|
|
above) from the snapshot to a backup device, then release the frozen
|
|
|
|
snapshot. This will work even while the database server is running.
|
|
|
|
However, a backup created in this way saves
|
2003-11-04 10:55:39 +01:00
|
|
|
the database files in a state where the database server was not
|
|
|
|
properly shut down; therefore, when you start the database server
|
2004-01-19 21:12:30 +01:00
|
|
|
on the backed-up data, it will think the server had crashed
|
2003-11-04 10:55:39 +01:00
|
|
|
and replay the WAL log. This is not a problem, just be aware of
|
2004-08-03 22:32:36 +02:00
|
|
|
it (and be sure to include the WAL files in your dump).
|
2003-11-04 10:55:39 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2004-01-19 21:12:30 +01:00
|
|
|
If your database is spread across multiple volumes (for example,
|
|
|
|
data files and WAL log on different disks) there may not be any way
|
|
|
|
to obtain exactly-simultaneous frozen snapshots of all the volumes.
|
|
|
|
Read your filesystem documentation very carefully before trusting
|
|
|
|
to the consistent-snapshot technique in such situations.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Note that a file system backup will not necessarily be
|
2000-06-30 18:14:21 +02:00
|
|
|
smaller than an SQL dump. On the contrary, it will most likely be
|
|
|
|
larger. (<application>pg_dump</application> does not need to dump
|
2001-05-17 23:50:18 +02:00
|
|
|
the contents of indexes for example, just the commands to recreate
|
2000-06-30 18:14:21 +02:00
|
|
|
them.)
|
|
|
|
</para>
|
2004-08-03 22:32:36 +02:00
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="backup-online">
|
|
|
|
<title>On-line backup and point-in-time recovery</title>
|
|
|
|
|
2004-08-04 01:42:59 +02:00
|
|
|
<indexterm zone="backup">
|
|
|
|
<primary>on-line backup</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<indexterm zone="backup">
|
|
|
|
<primary>point-in-time recovery</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2004-08-03 22:32:36 +02:00
|
|
|
<para>
|
|
|
|
At all times, <productname>PostgreSQL</> maintains a <firstterm>write ahead
|
|
|
|
log</> (WAL) that shows details of every change made to the database's data
|
|
|
|
files. This log exists primarily for crash-safety purposes: if the system
|
|
|
|
crashes, the database can be restored to consistency by <quote>replaying</>
|
|
|
|
the log entries made since the last checkpoint. However, the existence
|
|
|
|
of the log makes it possible to use a third strategy for backing up
|
|
|
|
databases: we can combine a filesystem-level backup with backup of the WAL
|
|
|
|
files. If recovery is needed, we restore the backup and then replay from
|
|
|
|
the backed-up WAL files to bring the backup up to current time. This
|
|
|
|
approach is notably more complex to administer than either of the previous
|
|
|
|
approaches, but it has some significant benefits to offer:
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
We do not need a perfectly consistent backup as the starting point.
|
|
|
|
Any internal inconsistency in the backup will be corrected by log
|
|
|
|
replay (this is not significantly different from what happens during
|
|
|
|
crash recovery). So we don't need filesystem snapshot capability,
|
|
|
|
just <application>tar</> or a similar archiving tool.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Since we can string together an indefinitely long sequence of WAL files
|
|
|
|
for replay, continuous backup can be had simply by continuing to archive
|
|
|
|
the WAL files. This is particularly valuable for large databases, where
|
|
|
|
making a full backup may take an unreasonable amount of time.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
There is nothing that says we have to replay the WAL entries all the
|
|
|
|
way to the end. We could stop the replay at any point and have a
|
|
|
|
consistent snapshot of the database as it was at that time. Thus,
|
|
|
|
this technique supports <firstterm>point-in-time recovery</>: it is
|
|
|
|
possible to restore the database to its state at any time since your base
|
|
|
|
backup was taken.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
If we continuously feed the series of WAL files to another machine
|
2004-08-04 01:42:59 +02:00
|
|
|
that's been loaded with the same base backup file, we have a <quote>hot
|
2004-08-03 22:32:36 +02:00
|
|
|
standby</> system: at any point we can bring up the second machine
|
|
|
|
and it will have a nearly-current copy of the database.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As with the plain filesystem-backup technique, this method can only
|
|
|
|
support restoration of an entire database cluster, not a subset.
|
|
|
|
Also, it requires a lot of archival storage: the base backup is bulky,
|
|
|
|
and a busy system will generate many megabytes of WAL traffic that
|
|
|
|
have to be archived. Still, it is the preferred backup technique in
|
|
|
|
many situations where high reliability is needed.
|
|
|
|
</para>
|
2000-06-30 18:14:21 +02:00
|
|
|
|
2004-08-04 01:42:59 +02:00
|
|
|
<para>
|
|
|
|
To recover successfully using an on-line backup, you need a continuous
|
|
|
|
sequence of archived WAL files that extends back at least as far as the
|
|
|
|
start time of your backup. So to get started, you should set up and test
|
|
|
|
your procedure for archiving WAL files <emphasis>before</> you take your
|
|
|
|
first base backup. Accordingly, we first discuss the mechanics of
|
|
|
|
archiving WAL files.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2 id="backup-archiving-wal">
|
|
|
|
<title>Setting up WAL archiving</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In an abstract sense, a running <productname>PostgreSQL</> system
|
|
|
|
produces an indefinitely long sequence of WAL records. The system
|
|
|
|
physically divides this sequence into WAL <firstterm>segment files</>,
|
|
|
|
which are normally 16Mb apiece (although the size can be altered when
|
|
|
|
building the server). The segment files are given numeric names that
|
|
|
|
reflect their position in the abstract WAL sequence. When not using WAL
|
|
|
|
archiving, the system normally creates just a few segment files and then
|
|
|
|
<quote>recycles</> them by renaming no-longer-needed segment files to
|
|
|
|
higher segment numbers. It's assumed that a segment file whose contents
|
|
|
|
precede the checkpoint-before-last is no longer of interest and can be
|
|
|
|
recycled.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When archiving WAL data, we want to capture the contents of each segment
|
|
|
|
file once it is filled, and save that data somewhere before the segment
|
|
|
|
file is recycled for reuse. Depending on the application and the
|
|
|
|
available hardware, there could be many different ways of <quote>saving
|
|
|
|
the data somewhere</>: we could copy the segment files to an NFS-mounted
|
|
|
|
directory on another machine, or write them onto a tape drive, or batch
|
|
|
|
them together and burn them onto CDs, or something else entirely. To
|
|
|
|
provide the database administrator with as much flexibility as possible,
|
|
|
|
<productname>PostgreSQL</> tries not to make any assumptions about how
|
|
|
|
the archiving will be done. Instead, <productname>PostgreSQL</> lets
|
|
|
|
the administrator specify a shell command to be executed to copy a
|
|
|
|
completed segment file to wherever it needs to go. The command could be
|
|
|
|
as simple as a <application>cp</>, or it could invoke a complex shell
|
|
|
|
script --- it's all up to you.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The shell command to use is specified by the <xref
|
|
|
|
linkend="guc-archive-command"> configuration parameter, which in practice
|
|
|
|
will always be placed in the <filename>postgresql.conf</filename> file.
|
|
|
|
In this string,
|
|
|
|
any <literal>%p</> is replaced by the absolute path of the file to
|
|
|
|
archive, while any <literal>%f</> is replaced by the file name only.
|
|
|
|
Write <literal>%%</> if you need to embed an actual <literal>%</>
|
|
|
|
character in the command. The simplest useful command is something
|
|
|
|
like
|
|
|
|
<programlisting>
|
|
|
|
archive_command = 'cp %p /mnt/server/archivedir/%f'
|
|
|
|
</programlisting>
|
|
|
|
which will copy archivable WAL segments to the directory
|
|
|
|
<literal>/mnt/server/archivedir</>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The archive command will be executed under the ownership of the same
|
|
|
|
user that the <productname>PostgreSQL</> server is running as. Since
|
|
|
|
the series of WAL files being archived contains effectively everything
|
|
|
|
in your database, you will want to be sure that the archived data is
|
|
|
|
protected from prying eyes; for example, archive into a directory that
|
|
|
|
does not have group or world read access.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It is important that the archive command return zero exit status if and
|
|
|
|
only if it succeeded. Upon getting a zero result,
|
|
|
|
<productname>PostgreSQL</> will assume that the WAL segment file has been
|
|
|
|
successfully archived, and it may be overwritten with new data very
|
|
|
|
soon thereafter. However, a nonzero status tells
|
|
|
|
<productname>PostgreSQL</> that the file was not archived; it will try
|
|
|
|
again periodically until it succeeds.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Speed of the archiving command is not important, so long as it can keep up
|
|
|
|
with the average rate at which your server generates WAL data. It is okay
|
|
|
|
if the archiving process falls a little behind (or even a lot behind, if
|
|
|
|
you don't mind the <literal>pg_xlog/</> directory filling up with
|
|
|
|
not-yet-archived segment files).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If you are concerned about being able to recover right up to the current
|
|
|
|
instant, you may want to take additional steps to ensure that the current,
|
|
|
|
partially-filled WAL segment is also copied someplace. This is
|
|
|
|
particularly important if your server generates only little WAL traffic
|
|
|
|
(or has slack periods where it does so), since it could take a long time
|
|
|
|
before a WAL segment file is completely filled and ready to archive.
|
|
|
|
One possible way to handle this is to set up a <application>cron</> job
|
|
|
|
that periodically (once a minute, perhaps) identifies the current WAL
|
|
|
|
segment file and saves it someplace safe. The combination of the archived
|
|
|
|
WAL segments and the saved current segment will then be enough to ensure
|
|
|
|
you can always restore to within a minute of current time. This behavior
|
|
|
|
is not presently built into <productname>PostgreSQL</> because we did not
|
|
|
|
want to complicate the definition of the <xref
|
|
|
|
linkend="guc-archive-command"> by requiring it to keep track of
|
|
|
|
successively archived, but different, copies of the same WAL file.
|
|
|
|
The <xref linkend="guc-archive-command"> is only invoked on finished
|
|
|
|
WAL segments that will not change anymore; and except in the case of
|
|
|
|
retrying a failure, it will be called only once for any given file name.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In writing your archive command, you should assume that the filenames to
|
|
|
|
be archived may be up to 64 characters long and may contain any
|
|
|
|
combination of ASCII letters, digits, and dots. It is not necessary to
|
|
|
|
remember the original full path (<literal>%p</>) but it is necessary to
|
|
|
|
remember the file name (<literal>%f</>).
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="backup-base-backup">
|
|
|
|
<title>Making a Base Backup</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The procedure for making a base backup is relatively simple:
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Ensure that WAL archiving is enabled and working.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Connect to the database as a superuser, and issue the command
|
|
|
|
<programlisting>
|
|
|
|
SELECT pg_start_backup('label');
|
|
|
|
</programlisting>
|
|
|
|
where <literal>label</> is any string you want to use to uniquely
|
|
|
|
identify this backup operation. (One good practice is to use the
|
|
|
|
full path where you intend to put the backup dump file.) It does
|
|
|
|
not matter which database within the cluster you connect to to issue
|
|
|
|
this command. You can ignore the result returned by the function;
|
|
|
|
but if it reports an error, deal with that before proceeding.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Perform the backup, using any convenient filesystem-backup tool
|
|
|
|
such as <application>tar</> or <application>cpio</>. It is neither
|
|
|
|
necessary nor desirable to stop normal operation of the database
|
|
|
|
while you do this.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Again connect to the database as a superuser, and issue the command
|
|
|
|
<programlisting>
|
|
|
|
SELECT pg_stop_backup();
|
|
|
|
</programlisting>
|
|
|
|
If this returns successfully, you're done.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It is not necessary to be very concerned about the amount of time elapsed
|
|
|
|
between <function>pg_start_backup</> and the start of the actual backup,
|
|
|
|
nor between the end of the backup and <function>pg_stop_backup</>; a
|
|
|
|
few minutes' delay won't hurt anything. You
|
|
|
|
must however be quite sure that these operations are carried out in
|
|
|
|
sequence and don't overlap.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Be certain that your backup dump includes all of the files underneath
|
|
|
|
the database cluster directory (e.g., <literal>/usr/local/pgsql/data</>).
|
|
|
|
If you are using tablespaces that do not reside underneath this directory,
|
|
|
|
be careful to include them as well (and be sure that your backup dump
|
|
|
|
archives symbolic links as links, otherwise the restore will mess up
|
|
|
|
your tablespaces).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
You may, however, omit from the backup dump the files within the
|
|
|
|
<literal>pg_xlog/</> subdirectory of the cluster directory. This
|
|
|
|
slight complication is worthwhile because it reduces the risk
|
|
|
|
of mistakes when restoring. This is easy to arrange if
|
|
|
|
<literal>pg_xlog/</> is a symbolic link pointing to someplace outside
|
|
|
|
the cluster directory, which is a common setup anyway for performance
|
|
|
|
reasons.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
To make use of this backup, you will need to keep around all the WAL
|
|
|
|
segment files generated at or after the starting time of the backup.
|
|
|
|
To aid you in doing this, the <function>pg_stop_backup</> function
|
|
|
|
creates a <firstterm>backup history file</> that is immediately stored
|
|
|
|
into the WAL archive area. This file is named after the first WAL
|
|
|
|
segment file that you need to have to make use of the backup. For
|
|
|
|
example, if the starting WAL file is <literal>0000000100001234000055CD</>
|
|
|
|
the backup history file will be named something like
|
|
|
|
<literal>0000000100001234000055CD.007C9330.backup</>. (The second part of
|
|
|
|
this file name stands for an exact position within the WAL file, and can
|
|
|
|
ordinarily be ignored.) Once you have safely archived the backup dump
|
|
|
|
file, you can delete all archived WAL segments with names numerically
|
|
|
|
preceding this one. The backup history file is just a small text file.
|
|
|
|
It contains the label string you gave to <function>pg_start_backup</>, as
|
|
|
|
well as the starting and ending times of the backup. If you used the
|
|
|
|
label to identify where the associated dump file is kept, then the
|
|
|
|
archived history file is enough to tell you which dump file to restore,
|
|
|
|
should you need to do so.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Since you have to keep around all the archived WAL files back to your
|
|
|
|
last full dump, your interval between full dumps would usually be chosen
|
|
|
|
based on how much storage you want to expend on archived WAL files.
|
|
|
|
You should also consider how long you are prepared to spend recovering,
|
|
|
|
if recovery should be necessary --- the system will have to replay all
|
|
|
|
those segments, and that could take awhile if it's been a long time
|
|
|
|
since the full dump.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It's also worth noting that the <function>pg_start_backup</> function
|
|
|
|
makes a file named <literal>backup_label</> in the database cluster
|
|
|
|
directory, which is then removed again by <function>pg_stop_backup</>.
|
|
|
|
This file will of course be archived as a part of your backup dump file.
|
|
|
|
The backup label file includes the label string you gave to
|
|
|
|
<function>pg_start_backup</>, as well as the time at which
|
|
|
|
<function>pg_start_backup</> was run, and the name of the starting WAL
|
|
|
|
file. In case of confusion it will
|
|
|
|
therefore be possible to look inside a backup dump file and determine
|
|
|
|
exactly which backup session the dump file came from.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It is also possible to make a backup dump while the postmaster is
|
|
|
|
stopped. In this case, obviously you can't use
|
|
|
|
<function>pg_start_backup</> or <function>pg_stop_backup</>, and
|
|
|
|
you will therefore be left to your own devices to keep track of which
|
|
|
|
backup dump is which and how far back the associated WAL files go.
|
|
|
|
It's generally better to follow the on-line backup procedure above.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="backup-pitr-recovery">
|
|
|
|
<title>Recovering with an On-line Backup</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Okay, the worst has happened and you need to recover from your backup.
|
|
|
|
Here is the procedure:
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Stop the postmaster, if it's running, and clean out all existing files
|
|
|
|
under the cluster data directory and under the root directories of any
|
|
|
|
tablespaces you are using.
|
|
|
|
(If there are recent, unarchived WAL segment files in
|
|
|
|
<literal>pg_xlog/</> that you want to use during restore, move these aside
|
|
|
|
instead of removing them.)
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Restore the database files from your backup dump. Be careful that they
|
|
|
|
are restored with the right ownership (the database system user, not
|
|
|
|
root!) and with the right permissions. If you are using tablespaces,
|
|
|
|
you may want to verify that the symbolic links in <literal>pg_tblspc/</>
|
|
|
|
were correctly restored.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Remove any files present in <literal>pg_xlog/</>; these came from the
|
|
|
|
backup dump and are therefore probably obsolete rather than current.
|
|
|
|
If you didn't archive <literal>pg_xlog/</> at all, then re-create it,
|
|
|
|
and be sure to re-create the subdirectory
|
|
|
|
<literal>pg_xlog/archive_status/</> as well.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
If you had unarchived WAL segment files that you saved aside in step 1,
|
|
|
|
copy them into <literal>pg_xlog/</>. (It's best to copy them, not move
|
|
|
|
them back in, so that you still have the unmodified files if the worst
|
|
|
|
happens and you have to start over.)
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Create a recovery command file <literal>recovery.conf</> in the cluster
|
|
|
|
data directory, as discussed below. You may also want to temporarily
|
|
|
|
modify <literal>pg_hba.conf</> to prevent ordinary users from connecting
|
|
|
|
until you are sure the recovery has worked.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Start the postmaster. The postmaster will go into recovery mode and
|
|
|
|
proceed to read through the archived WAL files it needs. Upon completion
|
|
|
|
of the recovery process, the postmaster will rename
|
|
|
|
<literal>recovery.conf</> to <literal>recovery.done</> (to prevent
|
|
|
|
accidentally re-entering recovery mode in case of a crash later) and then
|
|
|
|
commence normal database operations.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Inspect the contents of the database to ensure you have recovered to
|
|
|
|
where you want to be. If not, return to step 1. If all is well,
|
|
|
|
let in your users by restoring <literal>pg_hba.conf</> to normal.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The key part of all this is to set up a recovery command file
|
|
|
|
that describes how you want to recover and how far the recovery
|
|
|
|
should run. You can use <literal>recovery.conf.sample</> (normally
|
|
|
|
installed in the installation <literal>share/</> directory) as a
|
|
|
|
prototype. The one thing that you absolutely must specify in
|
|
|
|
<literal>recovery.conf</> is the <literal>restore_command</>,
|
|
|
|
which tells how to get back archived WAL file segments. Like
|
|
|
|
the <literal>archive_command</>, this is a shell command string.
|
|
|
|
It may contain <literal>%f</>,
|
|
|
|
which is replaced by the name of the desired log file, and <literal>%p</>,
|
|
|
|
which is replaced by the absolute path to copy the log file to.
|
|
|
|
Write <literal>%%</> if you need to embed an actual <literal>%</>
|
|
|
|
character in the command. The simplest useful command is something
|
|
|
|
like
|
|
|
|
<programlisting>
|
|
|
|
restore_command = 'cp /mnt/server/archivedir/%f %p'
|
|
|
|
</programlisting>
|
|
|
|
which will copy previously archived WAL segments from the directory
|
|
|
|
<literal>/mnt/server/archivedir</>. You could of course use something
|
|
|
|
much more complicated, perhaps even a shell script that requests the
|
|
|
|
operator to mount an appropriate tape.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It is important that the command return nonzero exit status on failure.
|
|
|
|
The command <emphasis>will</> be asked for log files that are not present
|
|
|
|
in the archive; it must return nonzero when so asked. This is not an
|
|
|
|
error condition. Be aware also that the basename of the <literal>%p</>
|
|
|
|
path will be different from <literal>%f</>; do not expect them to be
|
|
|
|
interchangeable.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
WAL segments that cannot be found in the archive will be sought in
|
|
|
|
<literal>pg_xlog/</>; this allows use of recent un-archived segments.
|
|
|
|
However segments that are available from the archive will be used in
|
|
|
|
preference to files in <literal>pg_xlog/</>. The system will not
|
|
|
|
overwrite the existing contents of <literal>pg_xlog/</> when retrieving
|
|
|
|
archived files.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Normally, recovery will proceed through all available WAL segments,
|
|
|
|
thereby restoring the database to current time (or as close as we can
|
|
|
|
get given the available WAL segments). But if you want to recover to
|
|
|
|
some previous point in time (say, right before the junior DBA dropped your
|
|
|
|
main transaction table), just specify the required stopping point in
|
|
|
|
<literal>recovery.conf</>. You can specify the stop point either by
|
2004-08-07 20:07:46 +02:00
|
|
|
date/time or completion of a specific transaction ID. The stop
|
|
|
|
specification can be inclusive or exclusive. As of this writing
|
|
|
|
only the date/time option is very usable, since there are no tools
|
|
|
|
to help you identify which transaction ID to use. Keep in mind
|
|
|
|
that while transaction IDs are asigned sequentially at transaction
|
|
|
|
start, transactions can complete in a different numeric order.
|
2004-08-07 14:21:25 +02:00
|
|
|
</para>
|
2004-08-07 05:21:11 +02:00
|
|
|
<para>
|
|
|
|
Note that the stop point must be after the ending time of the backup
|
|
|
|
(ie, the time of <function>pg_stop_backup</>). You cannot use a base
|
|
|
|
backup to recover to a time when that backup was still going on. (To
|
|
|
|
recover to such a time, you must go back to your previous base backup
|
|
|
|
and roll forward from there.)
|
2004-08-04 01:42:59 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="backup-timelines">
|
|
|
|
<title>Timelines</title>
|
|
|
|
|
|
|
|
<indexterm zone="backup">
|
|
|
|
<primary>timelines</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The ability to restore the database to a previous point in time creates
|
|
|
|
some complexities that are akin to science-fiction stories about time
|
|
|
|
travel and parallel universes. In the original history of the database,
|
|
|
|
perhaps you dropped a critical table at 5:15PM on Tuesday evening.
|
|
|
|
Unfazed, you get out your backup, restore to the point-in-time 5:14PM
|
|
|
|
Tuesday evening, and are up and running. In <emphasis>this</> history of
|
|
|
|
the database universe, you never dropped the table at all. But suppose
|
|
|
|
you later realize this wasn't such a great idea after all, and would like
|
|
|
|
to return to some later point in the original history? You won't be able
|
|
|
|
to if, while your database was up-and-running, it overwrote some of the
|
|
|
|
sequence of WAL segment files that led up to the time you now wish you
|
|
|
|
could get back to. So you really want to distinguish the series of
|
|
|
|
WAL records generated after you've done a point-in-time recovery from
|
|
|
|
those that were generated in the original database history.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
To deal with these problems, <productname>PostgreSQL</> has a notion
|
|
|
|
of <firstterm>timelines</>. Each time you recover to a point-in-time
|
|
|
|
earlier than the end of the WAL sequence, a new timeline is created
|
|
|
|
to identify the series of WAL records generated after that recovery.
|
|
|
|
(If recovery proceeds all the way to the end of WAL, however, we do not
|
|
|
|
start a new timeline: we just extend the existing one.) The timeline
|
|
|
|
ID number is part of WAL segment file names, and so a new timeline does
|
|
|
|
not overwrite the WAL data generated by previous timelines. It is
|
|
|
|
in fact possible to archive many different timelines. While that might
|
|
|
|
seem like a useless feature, it's often a lifesaver. Consider the
|
|
|
|
situation where you aren't quite sure what point-in-time to recover to,
|
|
|
|
and so have to do several point-in-time recoveries by trial and error
|
|
|
|
until you find the best place to branch off from the old history. Without
|
|
|
|
timelines this process would soon generate an unmanageable mess. With
|
|
|
|
timelines, you can recover to <emphasis>any</> prior state, including
|
|
|
|
states in timeline branches that you later abandoned.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Each time a new timeline is created, <productname>PostgreSQL</> creates
|
|
|
|
a <quote>timeline history</> file that shows which timeline it branched
|
|
|
|
off from and when. These history files are necessary to allow the system
|
|
|
|
to pick the right WAL segment files when recovering from an archive that
|
|
|
|
contains multiple timelines. Therefore, they are archived into the WAL
|
|
|
|
archive area just like WAL segment files. The history files are just
|
|
|
|
small text files, so it's cheap and appropriate to keep them around
|
|
|
|
indefinitely (unlike the segment files which are large). You can, if
|
|
|
|
you like, add comments to a history file to make your own notes about
|
|
|
|
how and why this particular timeline came to be. Such comments will be
|
|
|
|
especially valuable when you have a thicket of different timelines as
|
|
|
|
a result of experimentation.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The default behavior of recovery is to recover along the same timeline
|
|
|
|
that was current when the base backup was taken. If you want to recover
|
|
|
|
into some child timeline (that is, you want to return to some state that
|
|
|
|
was itself generated after a recovery attempt), you need to specify the
|
|
|
|
target timeline in <literal>recovery.conf</>. You cannot recover into
|
|
|
|
timelines that branched off earlier than the base backup.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
2004-08-04 19:37:09 +02:00
|
|
|
|
|
|
|
<sect2 id="backup-online-caveats">
|
|
|
|
<title>Caveats</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
At this writing, there are several limitations of the on-line backup
|
|
|
|
technique. These will probably be fixed in future releases.
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The effects of <command>CREATE DATABASE</>, <command>DROP DATABASE</>,
|
|
|
|
<command>CREATE TABLESPACE</>, and <command>DROP TABLESPACE</> are
|
|
|
|
not fully reflected in the WAL log. It is recommended that you take
|
|
|
|
a new base backup after performing one of these operations.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Operations on non-btree indexes (hash, R-tree, and GiST indexes) are
|
|
|
|
not presently WAL-logged, so replay will not update these index types.
|
|
|
|
The recommended workaround, if you use any non-btree indexes, is to
|
|
|
|
manually <command>REINDEX</> each such index after completing a
|
|
|
|
recovery operation.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
2004-08-08 06:34:43 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
It should also be noted that the present <acronym>WAL</acronym>
|
|
|
|
format is extremely bulky since it includes many disk page
|
|
|
|
snapshots. This is appropriate for crash recovery purposes,
|
|
|
|
since we may need to fix partially-written disk pages. It is not
|
|
|
|
necessary to store so many page copies for PITR operations, however.
|
|
|
|
An area for future development is to compress archived WAL data by
|
|
|
|
removing unnecesssary page copies.
|
|
|
|
</para>
|
2004-08-04 19:37:09 +02:00
|
|
|
</sect2>
|
2000-06-30 18:14:21 +02:00
|
|
|
</sect1>
|
|
|
|
|
2000-07-21 02:44:13 +02:00
|
|
|
<sect1 id="migration">
|
2003-11-04 10:55:39 +01:00
|
|
|
<title>Migration Between Releases</title>
|
2003-08-31 19:32:24 +02:00
|
|
|
|
|
|
|
<indexterm zone="migration">
|
|
|
|
<primary>upgrading</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<indexterm zone="migration">
|
|
|
|
<primary>version</primary>
|
|
|
|
<secondary>compatibility</secondary>
|
|
|
|
</indexterm>
|
2000-06-30 18:14:21 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
As a general rule, the internal data storage format is subject to
|
2003-11-04 10:55:39 +01:00
|
|
|
change between major releases of <productname>PostgreSQL</> (where
|
|
|
|
the number after the first dot changes). This does not apply to
|
|
|
|
different minor releases under the same major release (where the
|
2004-08-03 22:32:36 +02:00
|
|
|
number after the second dot changes); these always have compatible
|
2003-11-04 10:55:39 +01:00
|
|
|
storage formats. For example, releases 7.0.1, 7.1.2, and 7.2 are
|
|
|
|
not compatible, whereas 7.1.1 and 7.1.2 are. When you update
|
2004-08-03 22:32:36 +02:00
|
|
|
between compatible versions, you can simply replace the executables
|
|
|
|
and reuse the data area on disk. Otherwise you need to
|
2000-06-30 18:14:21 +02:00
|
|
|
<quote>back up</> your data and <quote>restore</> it on the new
|
|
|
|
server, using <application>pg_dump</>. (There are checks in place
|
|
|
|
that prevent you from doing the wrong thing, so no harm can be done
|
|
|
|
by confusing these things.) The precise installation procedure is
|
2004-08-03 22:32:36 +02:00
|
|
|
not the subject of this section; those details are in <xref
|
2003-11-04 10:55:39 +01:00
|
|
|
linkend="installation">.
|
2000-06-30 18:14:21 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The least downtime can be achieved by installing the new server in
|
|
|
|
a different directory and running both the old and the new servers
|
|
|
|
in parallel, on different ports. Then you can use something like
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
<programlisting>
|
|
|
|
pg_dumpall -p 5432 | psql -d template1 -p 6543
|
|
|
|
</programlisting>
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2003-03-24 15:32:51 +01:00
|
|
|
to transfer your data. Or use an intermediate file if you want.
|
2000-06-30 18:14:21 +02:00
|
|
|
Then you can shut down the old server and start the new server at
|
|
|
|
the port the old one was running at. You should make sure that the
|
|
|
|
database is not updated after you run <application>pg_dumpall</>,
|
|
|
|
otherwise you will obviously lose that data. See <xref
|
|
|
|
linkend="client-authentication"> for information on how to prohibit
|
|
|
|
access. In practice you probably want to test your client
|
|
|
|
applications on the new setup before switching over.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If you cannot or do not want to run two servers in parallel you can
|
2004-08-03 22:32:36 +02:00
|
|
|
do the backup step before installing the new version, bring down
|
2000-06-30 18:14:21 +02:00
|
|
|
the server, move the old version out of the way, install the new
|
|
|
|
version, start the new server, restore the data. For example:
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
<programlisting>
|
|
|
|
pg_dumpall > backup
|
2001-09-10 09:17:01 +02:00
|
|
|
pg_ctl stop
|
2000-06-30 18:14:21 +02:00
|
|
|
mv /usr/local/pgsql /usr/local/pgsql.old
|
2003-03-24 15:32:51 +01:00
|
|
|
cd ~/postgresql-&version;
|
2000-06-30 18:14:21 +02:00
|
|
|
gmake install
|
|
|
|
initdb -D /usr/local/pgsql/data
|
|
|
|
postmaster -D /usr/local/pgsql/data
|
2002-10-21 04:11:37 +02:00
|
|
|
psql template1 < backup
|
2000-06-30 18:14:21 +02:00
|
|
|
</programlisting>
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
See <xref linkend="runtime"> about ways to start and stop the
|
|
|
|
server and other details. The installation instructions will advise
|
|
|
|
you of strategic places to perform these steps.
|
|
|
|
</para>
|
|
|
|
|
2004-08-03 22:32:36 +02:00
|
|
|
<para>
|
|
|
|
You will always need a SQL dump (<application>pg_dump</> dump) for
|
|
|
|
migrating to a new release. Filesystem-level backups (including
|
|
|
|
on-line backups) will not work, for the same reason that you can't
|
|
|
|
just do the update in-place: the file formats won't necessarily be
|
|
|
|
compatible across major releases.
|
|
|
|
</para>
|
|
|
|
|
2000-06-30 18:14:21 +02:00
|
|
|
<note>
|
|
|
|
<para>
|
|
|
|
When you <quote>move the old installation out of the way</quote>
|
|
|
|
it is no longer perfectly usable. Some parts of the installation
|
|
|
|
contain information about where the other parts are located. This
|
|
|
|
is usually not a big problem but if you plan on using two
|
|
|
|
installations in parallel for a while you should assign them
|
|
|
|
different installation directories at build time.
|
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
2001-11-21 06:53:41 +01:00
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode:sgml
|
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-tabs-mode:nil
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:("/usr/share/sgml/catalog")
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|