2001-08-26 23:17:12 +02:00
|
|
|
<!--
|
2001-11-18 23:17:30 +01:00
|
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.7 2001/11/18 22:17:30 tgl Exp $
|
2001-08-26 23:17:12 +02:00
|
|
|
-->
|
|
|
|
|
|
|
|
<chapter id="maintenance">
|
|
|
|
<title>Routine Database Maintenance Tasks</title>
|
|
|
|
|
|
|
|
<sect1 id="maintenance-general">
|
|
|
|
<title>General Discussion</Title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
There are a few routine maintenance chores that must be performed on
|
|
|
|
a regular basis to keep a <productname>PostgreSQL</productname>
|
|
|
|
installation running smoothly. The tasks discussed here are repetitive
|
|
|
|
in nature and can easily be automated using standard Unix tools such
|
|
|
|
as <filename>cron</filename> scripts. But it is the database
|
|
|
|
administrator's responsibility to set up appropriate scripts, and to
|
|
|
|
check that they execute successfully.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
One obvious maintenance task is creation of backup copies of the data on a
|
|
|
|
regular schedule. Without a recent backup, you have no chance of recovery
|
|
|
|
after a catastrophe (disk failure, fire, mistakenly dropping a critical
|
|
|
|
table, etc). The backup and recovery mechanisms available in
|
|
|
|
<productname>PostgreSQL</productname> are discussed at length in
|
|
|
|
<xref linkend="backup">.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The other main category of maintenance task is periodic <quote>vacuuming</>
|
|
|
|
of the database. This activity is discussed in
|
|
|
|
<xref linkend="routine-vacuuming">.
|
|
|
|
</para>
|
|
|
|
|
2001-09-23 06:06:24 +02:00
|
|
|
<para>
|
|
|
|
It is also necessary to rotate the database server log file that is specified
|
|
|
|
when the <application>postmaster</application> is started. If you are using
|
|
|
|
<application>syslog</application>, you can send a <literal>SIGHUP</literal>
|
|
|
|
signal to the syslog daemon to force it to start writing a new log file.
|
|
|
|
</para>
|
2001-08-26 23:17:12 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
<productname>PostgreSQL</productname> is low-maintenance compared to
|
|
|
|
some other database products. Nonetheless, appropriate attention to
|
|
|
|
these tasks will go far towards ensuring a pleasant and productive
|
|
|
|
experience with the system.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="routine-vacuuming">
|
|
|
|
<title>Routine Vacuuming</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="routine-vacuuming">
|
|
|
|
<primary>vacuum</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
|
|
|
<productname>PostgreSQL</productname>'s <command>VACUUM</> command must be
|
|
|
|
run on a regular basis for several reasons:
|
|
|
|
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
|
|
<simpara>To recover disk space occupied by updated or deleted
|
|
|
|
rows.</simpara>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<simpara>To update data statistics used by the
|
|
|
|
<productname>PostgreSQL</productname> query planner.</simpara>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<simpara>To protect against loss of very old data due to
|
|
|
|
<firstterm>transaction ID wraparound</>.</simpara>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
|
|
|
|
The frequency and scope of <command>VACUUM</>s performed for each of
|
|
|
|
these reasons will vary depending on the needs of each installation.
|
|
|
|
Therefore, database administrators must understand these issues and
|
|
|
|
develop an appropriate maintenance strategy. This section concentrates
|
|
|
|
on explaining the high-level issues; for details about command syntax
|
|
|
|
and so on, see the <command>VACUUM</> command reference page.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Beginning in <productname>PostgreSQL</productname> 7.2, the standard form
|
|
|
|
of <command>VACUUM</> can run in parallel with normal database operations
|
|
|
|
(selects, inserts, updates, deletes, but not changes to table schemas).
|
|
|
|
Routine vacuuming is therefore not nearly as intrusive as it was in prior
|
|
|
|
releases, and it's not as critical to try to schedule it at low-usage
|
|
|
|
times of day.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2 id="vacuum-for-space-recovery">
|
|
|
|
<title>Recovering disk space</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="vacuum-for-space-recovery">
|
|
|
|
<primary>disk space</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
|
|
|
In normal <productname>PostgreSQL</productname> operation, an UPDATE or
|
|
|
|
DELETE of a row does not immediately remove the old <firstterm>tuple</>
|
|
|
|
(version of the row). This approach is necessary to gain the benefits
|
2001-11-18 23:17:30 +01:00
|
|
|
of multiversion concurrency control (see the <citetitle>User's Guide</>):
|
|
|
|
the tuple must not be deleted while
|
2001-08-26 23:17:12 +02:00
|
|
|
it is still potentially visible to other transactions. But eventually,
|
|
|
|
an outdated or deleted tuple is no longer of interest to any transaction.
|
|
|
|
The space it occupies must be reclaimed for reuse by new tuples, to avoid
|
|
|
|
infinite growth of disk space requirements. This is done by running
|
|
|
|
<command>VACUUM</>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Clearly, a table that receives frequent updates or deletes will need
|
|
|
|
to be vacuumed more often than tables that are seldom updated. It may
|
2001-09-10 01:52:12 +02:00
|
|
|
be useful to set up periodic <application>cron</> tasks that vacuum only selected tables,
|
2001-08-26 23:17:12 +02:00
|
|
|
skipping tables that are known not to change often. This is only likely
|
|
|
|
to be helpful if you have both large heavily-updated tables and large
|
|
|
|
seldom-updated tables --- the extra cost of vacuuming a small table
|
|
|
|
isn't enough to be worth worrying about.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The standard form of <command>VACUUM</> is best used with the goal of
|
|
|
|
maintaining a fairly level steady-state usage of disk space. The standard
|
|
|
|
form finds old tuples and makes their space available for re-use within
|
|
|
|
the table, but it does not try very hard to shorten the table file and
|
|
|
|
return disk space to the operating system. If you need to return disk
|
|
|
|
space to the operating system you can use <command>VACUUM FULL</> ---
|
|
|
|
but what's the point of releasing disk space that will only have to be
|
|
|
|
allocated again soon? Moderately frequent standard <command>VACUUM</>s
|
|
|
|
are a better approach than infrequent <command>VACUUM FULL</>s for
|
|
|
|
maintaining heavily-updated tables.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Recommended practice for most sites is to schedule a database-wide
|
|
|
|
<command>VACUUM</> once a day at a low-usage time of day, supplemented
|
|
|
|
by more frequent vacuuming of heavily-updated tables if necessary.
|
|
|
|
(If you have multiple databases in an installation, don't forget to
|
|
|
|
vacuum each one; the <filename>vacuumdb</> script may be helpful.)
|
|
|
|
Use plain <command>VACUUM</>, not <command>VACUUM FULL</>, for routine
|
|
|
|
vacuuming for space recovery.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<command>VACUUM FULL</> is recommended for cases where you know you have
|
|
|
|
deleted the majority of tuples in a table, so that the steady-state size
|
|
|
|
of the table can be shrunk substantially with <command>VACUUM FULL</>'s
|
|
|
|
more aggressive approach.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If you have a table whose contents are deleted completely every so often,
|
|
|
|
consider doing it with <command>TRUNCATE</> rather than using
|
|
|
|
<command>DELETE</> followed by <command>VACUUM</>.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="vacuum-for-statistics">
|
|
|
|
<title>Updating planner statistics</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <productname>PostgreSQL</productname> query planner relies on
|
|
|
|
statistical information about the contents of tables in order to
|
|
|
|
generate good plans for queries. These statistics are gathered by
|
|
|
|
the <command>ANALYZE</> command, which can be invoked by itself or
|
|
|
|
as an optional step in <command>VACUUM</>. It is important to have
|
|
|
|
reasonably accurate statistics, otherwise poor choices of plans may
|
|
|
|
degrade database performance.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As with vacuuming for space recovery, frequent updates of statistics
|
|
|
|
are more useful for heavily-updated tables than for seldom-updated ones.
|
|
|
|
But even for a heavily-updated table, there may be no need for
|
|
|
|
statistics updates if the statistical distribution of the data is not
|
|
|
|
changing much. A simple rule of thumb is to think about how much
|
|
|
|
the minimum and maximum values of the columns in the table change.
|
2001-09-10 01:52:12 +02:00
|
|
|
For example, a <type>timestamp</type> column that contains the time of row update
|
2001-08-26 23:17:12 +02:00
|
|
|
will have a constantly-increasing maximum value as rows are added and
|
|
|
|
updated; such a column will probably need more frequent statistics
|
|
|
|
updates than, say, a column containing URLs for pages accessed on a
|
|
|
|
website. The URL column may receive changes just as often, but the
|
|
|
|
statistical distribution of its values probably changes relatively
|
|
|
|
slowly.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It is possible to run <command>ANALYZE</> on specific tables and even
|
|
|
|
just specific columns of a table, so the flexibility exists to update some
|
|
|
|
statistics more frequently than others if your application requires it.
|
|
|
|
In practice, however, the usefulness of this feature is doubtful.
|
|
|
|
Beginning in <productname>PostgreSQL</productname> 7.2,
|
|
|
|
<command>ANALYZE</> is a fairly fast operation even on large tables,
|
|
|
|
because it uses a statistical random sampling of the rows of a table
|
|
|
|
rather than reading every single row. So it's probably much simpler
|
|
|
|
to just run it over the whole database every so often.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<tip>
|
|
|
|
<para>
|
|
|
|
Although per-column tweaking of <command>ANALYZE</> frequency may not be
|
|
|
|
very productive, you may well find it worthwhile to do per-column
|
|
|
|
adjustment of the level of detail of the statistics collected by
|
|
|
|
<command>ANALYZE</>. Columns that are heavily used in WHERE clauses
|
|
|
|
and have highly irregular data distributions may require a finer-grain
|
|
|
|
data histogram than other columns. See <command>ALTER TABLE SET
|
|
|
|
STATISTICS</>.
|
|
|
|
</para>
|
|
|
|
</tip>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Recommended practice for most sites is to schedule a database-wide
|
|
|
|
<command>ANALYZE</> once a day at a low-usage time of day; this can
|
|
|
|
usefully be combined with a nightly <command>VACUUM</>. However,
|
|
|
|
sites with relatively slowly changing table statistics may find that
|
|
|
|
this is overkill, and that less-frequent <command>ANALYZE</> runs
|
|
|
|
are sufficient.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="vacuum-for-wraparound">
|
|
|
|
<title>Preventing transaction ID wraparound failures</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="vacuum-for-wraparound">
|
|
|
|
<primary>transaction ID</primary>
|
|
|
|
<secondary>wraparound</secondary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
|
|
|
<productname>PostgreSQL</productname>'s MVCC transaction semantics
|
|
|
|
depend on being able to compare transaction ID (<firstterm>XID</>)
|
|
|
|
numbers: a tuple with an insertion XID newer than the current
|
|
|
|
transaction's XID is <quote>in the future</> and should not be visible
|
|
|
|
to the current transaction. But since transaction IDs have limited size
|
|
|
|
(32 bits at this writing) an installation that runs for a long time (more
|
|
|
|
than 4 billion transactions) will suffer <firstterm>transaction ID
|
|
|
|
wraparound</>: the XID counter wraps around to zero, and all of a sudden
|
|
|
|
transactions that were in the past appear to be in the future --- which
|
2001-08-28 01:42:34 +02:00
|
|
|
means their outputs become invisible. In short, catastrophic data loss.
|
2001-08-26 23:17:12 +02:00
|
|
|
(Actually the data is still there, but that's cold comfort if you can't
|
|
|
|
get at it.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Prior to <productname>PostgreSQL</productname> 7.2, the only defense
|
2001-09-10 01:52:12 +02:00
|
|
|
against XID wraparound was to re-<command>initdb</> at least every 4 billion
|
2001-08-26 23:17:12 +02:00
|
|
|
transactions. This of course was not very satisfactory for high-traffic
|
|
|
|
sites, so a better solution has been devised. The new approach allows an
|
2001-09-10 01:52:12 +02:00
|
|
|
installation to remain up indefinitely, without <command>initdb</> or any sort of
|
2001-08-28 01:42:34 +02:00
|
|
|
restart. The price is this maintenance requirement:
|
2001-09-10 01:52:12 +02:00
|
|
|
<emphasis>every table in the database must be vacuumed at least once every
|
2001-08-28 01:42:34 +02:00
|
|
|
billion transactions</emphasis>.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In practice this isn't an onerous requirement, but since the consequences
|
|
|
|
of failing to meet it can be complete data loss (not just wasted disk
|
|
|
|
space or slow performance), some special provisions have been made to help
|
|
|
|
database administrators keep track of the time since the last
|
|
|
|
<command>VACUUM</>. The remainder of this section gives the details.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The new approach to XID comparison distinguishes two special XIDs, numbers
|
|
|
|
1 and 2 (<literal>BootstrapXID</> and <literal>FrozenXID</>). These two
|
|
|
|
XIDs are always considered older than every normal XID. Normal XIDs (those
|
|
|
|
greater than 2) are compared using modulo-2^31 arithmetic. This means
|
|
|
|
that for every normal XID, there are two billion XIDs that are
|
|
|
|
<quote>older</> and two billion that are <quote>newer</>; another way to
|
|
|
|
say it is that the normal XID space is circular with no endpoint.
|
|
|
|
Therefore, once a tuple has been created with a particular normal XID, the
|
|
|
|
tuple will appear to be <quote>in the past</> for the next two billion
|
|
|
|
transactions, no matter which normal XID we are talking about. If the
|
2001-08-28 01:42:34 +02:00
|
|
|
tuple still exists after more than two billion transactions, it will
|
2001-08-26 23:17:12 +02:00
|
|
|
suddenly appear to be in the future. To prevent data loss, old tuples
|
|
|
|
must be reassigned the XID <literal>FrozenXID</> sometime before they reach
|
|
|
|
the two-billion-transactions-old mark. Once they are assigned this
|
|
|
|
special XID, they will appear to be <quote>in the past</> to all normal
|
|
|
|
transactions regardless of wraparound issues, and so such tuples will be
|
|
|
|
good until deleted, no matter how long that is. This reassignment of
|
|
|
|
XID is handled by <command>VACUUM</>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<command>VACUUM</>'s normal policy is to reassign <literal>FrozenXID</>
|
|
|
|
to any tuple with a normal XID more than one billion transactions in the
|
|
|
|
past. This policy preserves the original insertion XID until it is not
|
|
|
|
likely to be of interest anymore (in fact, most tuples will probably
|
2001-08-28 01:42:34 +02:00
|
|
|
live and die without ever being <quote>frozen</>). With this policy,
|
|
|
|
the maximum safe interval between <command>VACUUM</>s of any table
|
2001-08-26 23:17:12 +02:00
|
|
|
is exactly one billion transactions: if you wait longer, it's possible
|
|
|
|
that a tuple that was not quite old enough to be reassigned last time
|
|
|
|
is now more than two billion transactions old and has wrapped around
|
|
|
|
into the future --- ie, is lost to you. (Of course, it'll reappear
|
|
|
|
after another two billion transactions, but that's no help.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Since periodic <command>VACUUM</>s are needed anyway for the reasons
|
|
|
|
described earlier, it's unlikely that any table would not be vacuumed
|
|
|
|
for as long as a billion transactions. But to help administrators ensure
|
|
|
|
this constraint is met, <command>VACUUM</> stores transaction ID
|
|
|
|
statistics in the system table <filename>pg_database</>. In particular,
|
|
|
|
the <filename>datfrozenxid</> field of a database's
|
|
|
|
<filename>pg_database</> row is updated at the completion of any
|
|
|
|
database-wide vacuum operation (ie, <command>VACUUM</> that does not
|
|
|
|
name a specific table). The value stored in this field is the freeze
|
|
|
|
cutoff XID that was used by that <command>VACUUM</> command. All normal
|
|
|
|
XIDs older than this cutoff XID are guaranteed to have been replaced by
|
|
|
|
<literal>FrozenXID</> within that database. A convenient way to
|
|
|
|
examine this information is to execute the query
|
|
|
|
<informalexample>
|
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
SELECT datname, age(datfrozenxid) FROM pg_database;
|
2001-08-26 23:17:12 +02:00
|
|
|
</programlisting>
|
|
|
|
</informalexample>
|
|
|
|
The <literal>age</> column measures the number of transactions from the
|
|
|
|
cutoff XID to the current transaction's XID.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
With the standard freezing policy, the <literal>age</> column will start
|
|
|
|
at one billion for a freshly-vacuumed database. When the <literal>age</>
|
|
|
|
approaches two billion, the database must be vacuumed again to avoid
|
|
|
|
risk of wraparound failures. Recommended practice is to vacuum each
|
|
|
|
database at least once every half-a-billion (500 million) transactions,
|
|
|
|
so as to provide plenty of safety margin. To help meet this rule,
|
|
|
|
each database-wide <command>VACUUM</> automatically delivers a warning
|
|
|
|
if there are any <filename>pg_database</> entries showing an
|
|
|
|
<literal>age</> of more than 1.5 billion transactions, for example:
|
|
|
|
<informalexample>
|
|
|
|
<programlisting>
|
|
|
|
play=# vacuum;
|
|
|
|
NOTICE: Some databases have not been vacuumed in 1613770184 transactions.
|
|
|
|
Better vacuum them within 533713463 transactions,
|
|
|
|
or you may have a wraparound failure.
|
|
|
|
VACUUM
|
|
|
|
</programlisting>
|
|
|
|
</informalexample>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<command>VACUUM</> with the <command>FREEZE</> option uses a more
|
|
|
|
aggressive freezing policy: tuples are frozen if they are old enough
|
|
|
|
to be considered good by all open transactions. In particular, if
|
|
|
|
a <command>VACUUM FREEZE</> is performed in an otherwise-idle database,
|
|
|
|
it is guaranteed that <emphasis>all</> tuples in that database will be
|
2001-08-28 01:42:34 +02:00
|
|
|
frozen. Hence, as long as the database is not modified in any way, it
|
2001-08-26 23:17:12 +02:00
|
|
|
will not need subsequent vacuuming to avoid transaction ID wraparound
|
|
|
|
problems. This technique is used by <filename>initdb</> to prepare the
|
|
|
|
<filename>template0</> database. It should also be used to prepare any
|
|
|
|
user-created databases that are to be marked <literal>datallowconn</> =
|
|
|
|
<literal>false</> in <filename>pg_database</>, since there isn't any
|
|
|
|
convenient way to vacuum a database that you can't connect to. Note
|
2001-09-10 01:52:12 +02:00
|
|
|
that <command>VACUUM</command>'s automatic warning message about unvacuumed databases will
|
2001-08-26 23:17:12 +02:00
|
|
|
ignore <filename>pg_database</> entries with <literal>datallowconn</> =
|
|
|
|
<literal>false</>, so as to avoid giving false warnings about these
|
|
|
|
databases; therefore it's up to you to ensure that such databases are
|
|
|
|
frozen correctly.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode:sgml
|
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|