2001-08-26 23:17:12 +02:00
|
|
|
<!--
|
2005-05-11 00:27:30 +02:00
|
|
|
$PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.43 2005/05/10 22:27:29 momjian Exp $
|
2001-08-26 23:17:12 +02:00
|
|
|
-->
|
|
|
|
|
|
|
|
<chapter id="maintenance">
|
|
|
|
<title>Routine Database Maintenance Tasks</title>
|
|
|
|
|
2003-08-31 19:32:24 +02:00
|
|
|
<indexterm zone="maintenance">
|
|
|
|
<primary>maintenance</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
|
|
|
There are a few routine maintenance chores that must be performed on
|
|
|
|
a regular basis to keep a <productname>PostgreSQL</productname>
|
2003-03-24 15:32:51 +01:00
|
|
|
server running smoothly. The tasks discussed here are repetitive
|
2001-08-26 23:17:12 +02:00
|
|
|
in nature and can easily be automated using standard Unix tools such
|
2002-11-11 21:14:04 +01:00
|
|
|
as <application>cron</application> scripts. But it is the database
|
2001-08-26 23:17:12 +02:00
|
|
|
administrator's responsibility to set up appropriate scripts, and to
|
|
|
|
check that they execute successfully.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
One obvious maintenance task is creation of backup copies of the data on a
|
|
|
|
regular schedule. Without a recent backup, you have no chance of recovery
|
|
|
|
after a catastrophe (disk failure, fire, mistakenly dropping a critical
|
2003-03-24 15:32:51 +01:00
|
|
|
table, etc.). The backup and recovery mechanisms available in
|
2001-08-26 23:17:12 +02:00
|
|
|
<productname>PostgreSQL</productname> are discussed at length in
|
|
|
|
<xref linkend="backup">.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The other main category of maintenance task is periodic <quote>vacuuming</>
|
|
|
|
of the database. This activity is discussed in
|
|
|
|
<xref linkend="routine-vacuuming">.
|
|
|
|
</para>
|
|
|
|
|
2001-09-23 06:06:24 +02:00
|
|
|
<para>
|
2001-11-20 05:27:49 +01:00
|
|
|
Something else that might need periodic attention is log file management.
|
|
|
|
This is discussed in <xref linkend="logfile-maintenance">.
|
2001-09-23 06:06:24 +02:00
|
|
|
</para>
|
2001-08-26 23:17:12 +02:00
|
|
|
|
|
|
|
<para>
|
2003-11-04 10:55:39 +01:00
|
|
|
<productname>PostgreSQL</productname> is low-maintenance compared
|
|
|
|
to some other database management systems. Nonetheless,
|
|
|
|
appropriate attention to these tasks will go far towards ensuring a
|
|
|
|
pleasant and productive experience with the system.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect1 id="routine-vacuuming">
|
|
|
|
<title>Routine Vacuuming</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="routine-vacuuming">
|
|
|
|
<primary>vacuum</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
2002-06-23 05:37:12 +02:00
|
|
|
<productname>PostgreSQL</productname>'s <command>VACUUM</> command
|
|
|
|
must be run on a regular basis for several reasons:
|
2001-08-26 23:17:12 +02:00
|
|
|
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
|
|
<simpara>To recover disk space occupied by updated or deleted
|
|
|
|
rows.</simpara>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<simpara>To update data statistics used by the
|
|
|
|
<productname>PostgreSQL</productname> query planner.</simpara>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<simpara>To protect against loss of very old data due to
|
|
|
|
<firstterm>transaction ID wraparound</>.</simpara>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
|
2003-03-24 15:32:51 +01:00
|
|
|
The frequency and scope of the <command>VACUUM</> operations performed for each of
|
|
|
|
these reasons will vary depending on the needs of each site.
|
2001-08-26 23:17:12 +02:00
|
|
|
Therefore, database administrators must understand these issues and
|
|
|
|
develop an appropriate maintenance strategy. This section concentrates
|
|
|
|
on explaining the high-level issues; for details about command syntax
|
2004-12-27 23:30:10 +01:00
|
|
|
and so on, see the <xref linkend="sql-vacuum"> reference page.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Beginning in <productname>PostgreSQL</productname> 7.2, the standard form
|
|
|
|
of <command>VACUUM</> can run in parallel with normal database operations
|
2003-03-24 15:32:51 +01:00
|
|
|
(selects, inserts, updates, deletes, but not changes to table definitions).
|
2001-08-26 23:17:12 +02:00
|
|
|
Routine vacuuming is therefore not nearly as intrusive as it was in prior
|
2003-12-14 01:10:32 +01:00
|
|
|
releases, and it is not as critical to try to schedule it at low-usage
|
2001-08-26 23:17:12 +02:00
|
|
|
times of day.
|
|
|
|
</para>
|
|
|
|
|
2004-12-27 23:30:10 +01:00
|
|
|
<para>
|
|
|
|
Beginning in <productname>PostgreSQL</productname> 8.0, there are
|
|
|
|
configuration parameters that can be adjusted to further reduce the
|
|
|
|
performance impact of background vacuuming. See
|
|
|
|
<xref linkend="runtime-config-resource-vacuum-cost">.
|
|
|
|
</para>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<sect2 id="vacuum-for-space-recovery">
|
|
|
|
<title>Recovering disk space</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="vacuum-for-space-recovery">
|
|
|
|
<primary>disk space</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
2002-06-23 05:37:12 +02:00
|
|
|
In normal <productname>PostgreSQL</productname> operation, an
|
|
|
|
<command>UPDATE</> or <command>DELETE</> of a row does not
|
2003-11-01 02:56:29 +01:00
|
|
|
immediately remove the old version of the row.
|
2002-06-23 05:37:12 +02:00
|
|
|
This approach is necessary to gain the benefits of multiversion
|
2003-11-01 02:56:29 +01:00
|
|
|
concurrency control (see <xref linkend="mvcc">): the row version
|
2002-06-23 05:37:12 +02:00
|
|
|
must not be deleted while it is still potentially visible to other
|
2003-11-01 02:56:29 +01:00
|
|
|
transactions. But eventually, an outdated or deleted row version is no
|
2002-06-23 05:37:12 +02:00
|
|
|
longer of interest to any transaction. The space it occupies must be
|
2003-11-01 02:56:29 +01:00
|
|
|
reclaimed for reuse by new rows, to avoid infinite growth of disk
|
2002-06-23 05:37:12 +02:00
|
|
|
space requirements. This is done by running <command>VACUUM</>.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Clearly, a table that receives frequent updates or deletes will need
|
2002-06-23 05:37:12 +02:00
|
|
|
to be vacuumed more often than tables that are seldom updated. It
|
|
|
|
may be useful to set up periodic <application>cron</> tasks that
|
2003-12-14 01:10:32 +01:00
|
|
|
<command>VACUUM</command> only selected tables, skipping tables that are known not to
|
2002-06-23 05:37:12 +02:00
|
|
|
change often. This is only likely to be helpful if you have both
|
2004-11-15 07:32:15 +01:00
|
|
|
large heavily-updated tables and large seldom-updated tables — the
|
2002-06-23 05:37:12 +02:00
|
|
|
extra cost of vacuuming a small table isn't enough to be worth
|
|
|
|
worrying about.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-12-14 01:10:32 +01:00
|
|
|
There are two variants of the <command>VACUUM</command>
|
|
|
|
command. The first form, known as <quote>lazy vacuum</quote> or
|
|
|
|
just <command>VACUUM</command>, marks expired data in tables and
|
|
|
|
indexes for future reuse; it does <emphasis>not</emphasis> attempt
|
|
|
|
to reclaim the space used by this expired data
|
|
|
|
immediately. Therefore, the table file is not shortened, and any
|
|
|
|
unused space in the file is not returned to the operating
|
|
|
|
system. This variant of <command>VACUUM</command> can be run
|
|
|
|
concurrently with normal database operations.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The second form is the <command>VACUUM FULL</command>
|
|
|
|
command. This uses a more aggressive algorithm for reclaiming the
|
|
|
|
space consumed by expired row versions. Any space that is freed by
|
|
|
|
<command>VACUUM FULL</command> is immediately returned to the
|
|
|
|
operating system. Unfortunately, this variant of the
|
|
|
|
<command>VACUUM</command> command acquires an exclusive lock on
|
|
|
|
each table while <command>VACUUM FULL</command> is processing
|
|
|
|
it. Therefore, frequently using <command>VACUUM FULL</command> can
|
|
|
|
have an extremely negative effect on the performance of concurrent
|
|
|
|
database queries.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The standard form of <command>VACUUM</> is best used with the goal
|
|
|
|
of maintaining a fairly level steady-state usage of disk space. If
|
|
|
|
you need to return disk space to the operating system you can use
|
2004-11-15 07:32:15 +01:00
|
|
|
<command>VACUUM FULL</> — but what's the point of releasing disk
|
2003-12-14 01:10:32 +01:00
|
|
|
space that will only have to be allocated again soon? Moderately
|
|
|
|
frequent standard <command>VACUUM</> runs are a better approach
|
|
|
|
than infrequent <command>VACUUM FULL</> runs for maintaining
|
|
|
|
heavily-updated tables.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Recommended practice for most sites is to schedule a database-wide
|
2003-12-14 01:10:32 +01:00
|
|
|
<command>VACUUM</> once a day at a low-usage time of day,
|
|
|
|
supplemented by more frequent vacuuming of heavily-updated tables
|
2004-12-27 23:30:10 +01:00
|
|
|
if necessary. (Some installations with an extremely high
|
|
|
|
rate of data modification <command>VACUUM</command> busy tables as
|
|
|
|
often as once every few minutes.) If you have multiple databases
|
2003-12-14 01:10:32 +01:00
|
|
|
in a cluster, don't forget to <command>VACUUM</command> each one;
|
2004-12-27 23:30:10 +01:00
|
|
|
the program <filename>vacuumdb</> may be helpful.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
2004-12-27 23:30:10 +01:00
|
|
|
<tip>
|
|
|
|
<para>
|
|
|
|
The <filename>contrib/pg_autovacuum</> program can be useful for
|
|
|
|
automating high-frequency vacuuming operations.
|
|
|
|
</para>
|
|
|
|
</tip>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
2003-12-14 01:10:32 +01:00
|
|
|
<command>VACUUM FULL</> is recommended for cases where you know
|
|
|
|
you have deleted the majority of rows in a table, so that the
|
|
|
|
steady-state size of the table can be shrunk substantially with
|
|
|
|
<command>VACUUM FULL</>'s more aggressive approach. Use plain
|
|
|
|
<command>VACUUM</>, not <command>VACUUM FULL</>, for routine
|
|
|
|
vacuuming for space recovery.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-12-14 01:10:32 +01:00
|
|
|
If you have a table whose contents are deleted on a periodic
|
|
|
|
basis, consider doing it with <command>TRUNCATE</command> rather
|
|
|
|
than using <command>DELETE</command> followed by
|
|
|
|
<command>VACUUM</command>. <command>TRUNCATE</command> removes the
|
2004-12-13 19:05:10 +01:00
|
|
|
entire content of the table immediately, without requiring a
|
2003-12-14 01:10:32 +01:00
|
|
|
subsequent <command>VACUUM</command> or <command>VACUUM
|
|
|
|
FULL</command> to reclaim the now-unused disk space.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="vacuum-for-statistics">
|
|
|
|
<title>Updating planner statistics</title>
|
|
|
|
|
2003-08-31 19:32:24 +02:00
|
|
|
<indexterm zone="vacuum-for-statistics">
|
|
|
|
<primary>statistics</primary>
|
|
|
|
<secondary>of the planner</secondary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<indexterm zone="vacuum-for-statistics">
|
|
|
|
<primary>ANALYZE</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
|
|
|
The <productname>PostgreSQL</productname> query planner relies on
|
|
|
|
statistical information about the contents of tables in order to
|
|
|
|
generate good plans for queries. These statistics are gathered by
|
|
|
|
the <command>ANALYZE</> command, which can be invoked by itself or
|
|
|
|
as an optional step in <command>VACUUM</>. It is important to have
|
|
|
|
reasonably accurate statistics, otherwise poor choices of plans may
|
|
|
|
degrade database performance.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As with vacuuming for space recovery, frequent updates of statistics
|
2002-06-23 05:37:12 +02:00
|
|
|
are more useful for heavily-updated tables than for seldom-updated
|
|
|
|
ones. But even for a heavily-updated table, there may be no need for
|
|
|
|
statistics updates if the statistical distribution of the data is
|
|
|
|
not changing much. A simple rule of thumb is to think about how much
|
2001-08-26 23:17:12 +02:00
|
|
|
the minimum and maximum values of the columns in the table change.
|
2002-06-23 05:37:12 +02:00
|
|
|
For example, a <type>timestamp</type> column that contains the time
|
|
|
|
of row update will have a constantly-increasing maximum value as
|
|
|
|
rows are added and updated; such a column will probably need more
|
|
|
|
frequent statistics updates than, say, a column containing URLs for
|
|
|
|
pages accessed on a website. The URL column may receive changes just
|
|
|
|
as often, but the statistical distribution of its values probably
|
|
|
|
changes relatively slowly.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It is possible to run <command>ANALYZE</> on specific tables and even
|
|
|
|
just specific columns of a table, so the flexibility exists to update some
|
|
|
|
statistics more frequently than others if your application requires it.
|
|
|
|
In practice, however, the usefulness of this feature is doubtful.
|
|
|
|
Beginning in <productname>PostgreSQL</productname> 7.2,
|
|
|
|
<command>ANALYZE</> is a fairly fast operation even on large tables,
|
|
|
|
because it uses a statistical random sampling of the rows of a table
|
|
|
|
rather than reading every single row. So it's probably much simpler
|
|
|
|
to just run it over the whole database every so often.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<tip>
|
|
|
|
<para>
|
|
|
|
Although per-column tweaking of <command>ANALYZE</> frequency may not be
|
|
|
|
very productive, you may well find it worthwhile to do per-column
|
|
|
|
adjustment of the level of detail of the statistics collected by
|
2002-11-11 21:14:04 +01:00
|
|
|
<command>ANALYZE</>. Columns that are heavily used in <literal>WHERE</> clauses
|
2001-08-26 23:17:12 +02:00
|
|
|
and have highly irregular data distributions may require a finer-grain
|
|
|
|
data histogram than other columns. See <command>ALTER TABLE SET
|
|
|
|
STATISTICS</>.
|
|
|
|
</para>
|
|
|
|
</tip>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Recommended practice for most sites is to schedule a database-wide
|
|
|
|
<command>ANALYZE</> once a day at a low-usage time of day; this can
|
|
|
|
usefully be combined with a nightly <command>VACUUM</>. However,
|
|
|
|
sites with relatively slowly changing table statistics may find that
|
|
|
|
this is overkill, and that less-frequent <command>ANALYZE</> runs
|
|
|
|
are sufficient.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="vacuum-for-wraparound">
|
|
|
|
<title>Preventing transaction ID wraparound failures</title>
|
|
|
|
|
2001-11-12 20:19:39 +01:00
|
|
|
<indexterm zone="vacuum-for-wraparound">
|
|
|
|
<primary>transaction ID</primary>
|
|
|
|
<secondary>wraparound</secondary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
|
|
|
<productname>PostgreSQL</productname>'s MVCC transaction semantics
|
2003-03-24 15:32:51 +01:00
|
|
|
depend on being able to compare transaction ID (<acronym>XID</>)
|
2003-11-01 02:56:29 +01:00
|
|
|
numbers: a row version with an insertion XID greater than the current
|
2001-08-26 23:17:12 +02:00
|
|
|
transaction's XID is <quote>in the future</> and should not be visible
|
|
|
|
to the current transaction. But since transaction IDs have limited size
|
2003-03-24 15:32:51 +01:00
|
|
|
(32 bits at this writing) a cluster that runs for a long time (more
|
2005-02-20 03:22:07 +01:00
|
|
|
than 4 billion transactions) would suffer <firstterm>transaction ID
|
2001-08-26 23:17:12 +02:00
|
|
|
wraparound</>: the XID counter wraps around to zero, and all of a sudden
|
2004-11-15 07:32:15 +01:00
|
|
|
transactions that were in the past appear to be in the future — which
|
2001-08-28 01:42:34 +02:00
|
|
|
means their outputs become invisible. In short, catastrophic data loss.
|
2001-08-26 23:17:12 +02:00
|
|
|
(Actually the data is still there, but that's cold comfort if you can't
|
|
|
|
get at it.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Prior to <productname>PostgreSQL</productname> 7.2, the only defense
|
2002-06-23 05:37:12 +02:00
|
|
|
against XID wraparound was to re-<command>initdb</> at least every 4
|
|
|
|
billion transactions. This of course was not very satisfactory for
|
|
|
|
high-traffic sites, so a better solution has been devised. The new
|
2003-03-24 15:32:51 +01:00
|
|
|
approach allows a server to remain up indefinitely, without
|
2002-06-23 05:37:12 +02:00
|
|
|
<command>initdb</> or any sort of restart. The price is this
|
|
|
|
maintenance requirement: <emphasis>every table in the database must
|
|
|
|
be vacuumed at least once every billion transactions</emphasis>.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-06-23 05:37:12 +02:00
|
|
|
In practice this isn't an onerous requirement, but since the
|
|
|
|
consequences of failing to meet it can be complete data loss (not
|
|
|
|
just wasted disk space or slow performance), some special provisions
|
2005-02-20 03:22:07 +01:00
|
|
|
have been made to help database administrators avoid disaster.
|
|
|
|
For each database in the cluster, <productname>PostgreSQL</productname>
|
|
|
|
keeps track of the time of the last database-wide <command>VACUUM</>.
|
|
|
|
When any database approaches the billion-transaction danger level,
|
|
|
|
the system begins to emit warning messages. If nothing is done, it
|
|
|
|
will eventually shut down normal operations until appropriate
|
|
|
|
manual maintenance is done. The remainder of this
|
2002-06-23 05:37:12 +02:00
|
|
|
section gives the details.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-06-23 05:37:12 +02:00
|
|
|
The new approach to XID comparison distinguishes two special XIDs,
|
|
|
|
numbers 1 and 2 (<literal>BootstrapXID</> and
|
|
|
|
<literal>FrozenXID</>). These two XIDs are always considered older
|
|
|
|
than every normal XID. Normal XIDs (those greater than 2) are
|
|
|
|
compared using modulo-2<superscript>31</> arithmetic. This means
|
2001-08-26 23:17:12 +02:00
|
|
|
that for every normal XID, there are two billion XIDs that are
|
2002-06-23 05:37:12 +02:00
|
|
|
<quote>older</> and two billion that are <quote>newer</>; another
|
|
|
|
way to say it is that the normal XID space is circular with no
|
2003-11-01 02:56:29 +01:00
|
|
|
endpoint. Therefore, once a row version has been created with a particular
|
|
|
|
normal XID, the row version will appear to be <quote>in the past</> for
|
2002-06-23 05:37:12 +02:00
|
|
|
the next two billion transactions, no matter which normal XID we are
|
2003-11-01 02:56:29 +01:00
|
|
|
talking about. If the row version still exists after more than two billion
|
2002-06-23 05:37:12 +02:00
|
|
|
transactions, it will suddenly appear to be in the future. To
|
2003-11-01 02:56:29 +01:00
|
|
|
prevent data loss, old row versions must be reassigned the XID
|
2002-06-23 05:37:12 +02:00
|
|
|
<literal>FrozenXID</> sometime before they reach the
|
|
|
|
two-billion-transactions-old mark. Once they are assigned this
|
|
|
|
special XID, they will appear to be <quote>in the past</> to all
|
|
|
|
normal transactions regardless of wraparound issues, and so such
|
2003-11-01 02:56:29 +01:00
|
|
|
row versions will be good until deleted, no matter how long that is. This
|
2002-06-23 05:37:12 +02:00
|
|
|
reassignment of XID is handled by <command>VACUUM</>.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<command>VACUUM</>'s normal policy is to reassign <literal>FrozenXID</>
|
2003-11-01 02:56:29 +01:00
|
|
|
to any row version with a normal XID more than one billion transactions in the
|
2001-08-26 23:17:12 +02:00
|
|
|
past. This policy preserves the original insertion XID until it is not
|
2003-11-01 02:56:29 +01:00
|
|
|
likely to be of interest anymore. (In fact, most row versions will probably
|
2003-03-24 15:32:51 +01:00
|
|
|
live and die without ever being <quote>frozen</>.) With this policy,
|
|
|
|
the maximum safe interval between <command>VACUUM</> runs on any table
|
2001-08-26 23:17:12 +02:00
|
|
|
is exactly one billion transactions: if you wait longer, it's possible
|
2003-11-01 02:56:29 +01:00
|
|
|
that a row version that was not quite old enough to be reassigned last time
|
2001-08-26 23:17:12 +02:00
|
|
|
is now more than two billion transactions old and has wrapped around
|
2004-11-15 07:32:15 +01:00
|
|
|
into the future — i.e., is lost to you. (Of course, it'll reappear
|
2001-08-26 23:17:12 +02:00
|
|
|
after another two billion transactions, but that's no help.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-03-24 15:32:51 +01:00
|
|
|
Since periodic <command>VACUUM</> runs are needed anyway for the reasons
|
2001-08-26 23:17:12 +02:00
|
|
|
described earlier, it's unlikely that any table would not be vacuumed
|
|
|
|
for as long as a billion transactions. But to help administrators ensure
|
|
|
|
this constraint is met, <command>VACUUM</> stores transaction ID
|
2003-03-24 15:32:51 +01:00
|
|
|
statistics in the system table <literal>pg_database</>. In particular,
|
|
|
|
the <literal>datfrozenxid</> column of a database's
|
|
|
|
<literal>pg_database</> row is updated at the completion of any
|
2005-02-20 03:22:07 +01:00
|
|
|
database-wide <command>VACUUM</command> operation (i.e.,
|
|
|
|
<command>VACUUM</> that does not
|
2001-08-26 23:17:12 +02:00
|
|
|
name a specific table). The value stored in this field is the freeze
|
|
|
|
cutoff XID that was used by that <command>VACUUM</> command. All normal
|
|
|
|
XIDs older than this cutoff XID are guaranteed to have been replaced by
|
|
|
|
<literal>FrozenXID</> within that database. A convenient way to
|
|
|
|
examine this information is to execute the query
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
SELECT datname, age(datfrozenxid) FROM pg_database;
|
2001-08-26 23:17:12 +02:00
|
|
|
</programlisting>
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
The <literal>age</> column measures the number of transactions from the
|
|
|
|
cutoff XID to the current transaction's XID.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
With the standard freezing policy, the <literal>age</> column will start
|
|
|
|
at one billion for a freshly-vacuumed database. When the <literal>age</>
|
|
|
|
approaches two billion, the database must be vacuumed again to avoid
|
2003-12-14 01:10:32 +01:00
|
|
|
risk of wraparound failures. Recommended practice is to <command>VACUUM</command> each
|
2001-08-26 23:17:12 +02:00
|
|
|
database at least once every half-a-billion (500 million) transactions,
|
|
|
|
so as to provide plenty of safety margin. To help meet this rule,
|
|
|
|
each database-wide <command>VACUUM</> automatically delivers a warning
|
2003-03-24 15:32:51 +01:00
|
|
|
if there are any <literal>pg_database</> entries showing an
|
2001-08-26 23:17:12 +02:00
|
|
|
<literal>age</> of more than 1.5 billion transactions, for example:
|
2002-11-11 21:14:04 +01:00
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<programlisting>
|
2003-03-24 15:32:51 +01:00
|
|
|
play=# VACUUM;
|
2005-02-20 03:22:07 +01:00
|
|
|
WARNING: database "mydb" must be vacuumed within 177009986 transactions
|
|
|
|
HINT: To avoid a database shutdown, execute a full-database VACUUM in "mydb".
|
2001-08-26 23:17:12 +02:00
|
|
|
VACUUM
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
2005-02-20 03:22:07 +01:00
|
|
|
<para>
|
|
|
|
If the warnings emitted by <command>VACUUM</> go ignored, then
|
|
|
|
<productname>PostgreSQL</productname> will begin to emit a warning
|
|
|
|
like the above on every transaction start once there are fewer than 10
|
|
|
|
million transactions left until wraparound. If those warnings also are
|
|
|
|
ignored, the system will shut down and refuse to execute any new
|
|
|
|
transactions once there are fewer than 1 million transactions left
|
|
|
|
until wraparound:
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
play=# select 2+2;
|
|
|
|
ERROR: database is shut down to avoid wraparound data loss in database "mydb"
|
|
|
|
HINT: Stop the postmaster and use a standalone backend to VACUUM in "mydb".
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
The 1-million-transaction safety margin exists to let the
|
|
|
|
administrator recover without data loss, by manually executing the
|
|
|
|
required <command>VACUUM</> commands. However, since the system will not
|
|
|
|
execute commands once it has gone into the safety shutdown mode,
|
|
|
|
the only way to do this is to stop the postmaster and use a standalone
|
|
|
|
backend to execute <command>VACUUM</>. The shutdown mode is not enforced
|
|
|
|
by a standalone backend. See the <xref linkend="app-postgres"> reference
|
|
|
|
page for details about using a standalone backend.
|
|
|
|
</para>
|
|
|
|
|
2001-08-26 23:17:12 +02:00
|
|
|
<para>
|
|
|
|
<command>VACUUM</> with the <command>FREEZE</> option uses a more
|
2003-11-01 02:56:29 +01:00
|
|
|
aggressive freezing policy: row versions are frozen if they are old enough
|
2002-06-23 05:37:12 +02:00
|
|
|
to be considered good by all open transactions. In particular, if a
|
|
|
|
<command>VACUUM FREEZE</> is performed in an otherwise-idle
|
2003-11-01 02:56:29 +01:00
|
|
|
database, it is guaranteed that <emphasis>all</> row versions in that
|
2002-06-23 05:37:12 +02:00
|
|
|
database will be frozen. Hence, as long as the database is not
|
|
|
|
modified in any way, it will not need subsequent vacuuming to avoid
|
|
|
|
transaction ID wraparound problems. This technique is used by
|
2003-03-24 15:32:51 +01:00
|
|
|
<command>initdb</> to prepare the <literal>template0</> database.
|
2002-06-23 05:37:12 +02:00
|
|
|
It should also be used to prepare any user-created databases that
|
|
|
|
are to be marked <literal>datallowconn</> = <literal>false</> in
|
2003-03-24 15:32:51 +01:00
|
|
|
<literal>pg_database</>, since there isn't any convenient way to
|
2005-02-20 03:22:07 +01:00
|
|
|
<command>VACUUM</command> a database that you can't connect to.
|
2001-08-26 23:17:12 +02:00
|
|
|
</para>
|
|
|
|
|
2004-12-27 23:30:10 +01:00
|
|
|
<warning>
|
|
|
|
<para>
|
2005-02-20 03:22:07 +01:00
|
|
|
A database that is marked <literal>datallowconn</> = <literal>false</>
|
|
|
|
in <literal>pg_database</> is assumed to be properly frozen; the
|
|
|
|
automatic warnings and wraparound protection shutdown do not take
|
|
|
|
such databases into account. Therefore it's up to you to ensure
|
|
|
|
you've correctly frozen a database before you mark it with
|
|
|
|
<literal>datallowconn</> = <literal>false</>.
|
2004-12-27 23:30:10 +01:00
|
|
|
</para>
|
|
|
|
</warning>
|
2001-08-26 23:17:12 +02:00
|
|
|
</sect2>
|
|
|
|
</sect1>
|
2001-11-20 05:27:49 +01:00
|
|
|
|
2002-06-13 06:36:50 +02:00
|
|
|
|
2002-06-22 06:08:07 +02:00
|
|
|
<sect1 id="routine-reindex">
|
|
|
|
<title>Routine Reindexing</title>
|
|
|
|
|
|
|
|
<indexterm zone="routine-reindex">
|
|
|
|
<primary>reindex</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
2003-09-13 00:17:24 +02:00
|
|
|
In some situations it is worthwhile to rebuild indexes periodically
|
|
|
|
with the <command>REINDEX</> command. (There is also
|
|
|
|
<filename>contrib/reindexdb</> which can reindex an entire database.)
|
|
|
|
However, <productname>PostgreSQL</> 7.4 has substantially reduced the need
|
|
|
|
for this activity compared to earlier releases.
|
2002-06-22 06:08:07 +02:00
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
2001-11-20 05:27:49 +01:00
|
|
|
<sect1 id="logfile-maintenance">
|
|
|
|
<title>Log File Maintenance</title>
|
|
|
|
|
|
|
|
<indexterm zone="logfile-maintenance">
|
2003-08-31 19:32:24 +02:00
|
|
|
<primary>server log</primary>
|
|
|
|
<secondary>log file maintenance</secondary>
|
2001-11-20 05:27:49 +01:00
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
2003-12-14 01:10:32 +01:00
|
|
|
It is a good idea to save the database server's log output
|
|
|
|
somewhere, rather than just routing it to <filename>/dev/null</>.
|
|
|
|
The log output is invaluable when it comes time to diagnose
|
|
|
|
problems. However, the log output tends to be voluminous
|
|
|
|
(especially at higher debug levels) and you won't want to save it
|
|
|
|
indefinitely. You need to <quote>rotate</> the log files so that
|
|
|
|
new log files are started and old ones removed after a reasonable
|
|
|
|
period of time.
|
2001-11-20 05:27:49 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2004-08-06 01:32:13 +02:00
|
|
|
If you simply direct the <systemitem>stderr</> of the
|
|
|
|
<command>postmaster</command> into a
|
|
|
|
file, you will have log output, but
|
|
|
|
the only way to truncate the log file is to stop and restart
|
2003-12-14 01:10:32 +01:00
|
|
|
the <command>postmaster</command>. This may be OK if you are using
|
|
|
|
<productname>PostgreSQL</productname> in a development environment,
|
|
|
|
but few production servers would find this behavior acceptable.
|
2001-11-20 05:27:49 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2004-08-06 01:32:13 +02:00
|
|
|
A better approach is to send the <command>postmaster</>'s
|
|
|
|
<systemitem>stderr</> output to some type of log rotation program.
|
|
|
|
There is a built-in log rotation program, which you can use by
|
|
|
|
setting the configuration parameter <literal>redirect_stderr</> to
|
|
|
|
<literal>true</> in <filename>postgresql.conf</>. The control
|
|
|
|
parameters for this program are described in <xref
|
|
|
|
linkend="runtime-config-logging-where">.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Alternatively, you might prefer to use an external log rotation
|
|
|
|
program, if you have one that you are already using with other
|
|
|
|
server software. For example, the <application>rotatelogs</application>
|
|
|
|
tool included in the <productname>Apache</productname> distribution
|
|
|
|
can be used with <productname>PostgreSQL</productname>. To do this,
|
|
|
|
just pipe the <command>postmaster</>'s
|
|
|
|
<systemitem>stderr</> output to the desired program.
|
|
|
|
If you start the server with
|
|
|
|
<command>pg_ctl</>, then <systemitem>stderr</>
|
|
|
|
is already redirected to <systemitem>stdout</>, so you just need a
|
2004-12-27 23:30:10 +01:00
|
|
|
pipe command, for example:
|
2004-08-06 01:32:13 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
pg_ctl start | rotatelogs /var/log/pgsql_log 86400
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Another production-grade approach to managing log output is to
|
2003-11-01 02:56:29 +01:00
|
|
|
send it all to <application>syslog</> and let
|
|
|
|
<application>syslog</> deal with file rotation. To do this, set the
|
2004-08-06 01:32:13 +02:00
|
|
|
configuration parameter <literal>log_destination</> to <literal>syslog</>
|
|
|
|
(to log to <application>syslog</> only) in
|
|
|
|
<filename>postgresql.conf</>. Then you can send a <literal>SIGHUP</literal>
|
|
|
|
signal to the <application>syslog</> daemon whenever you want to force it
|
|
|
|
to start writing a new log file. If you want to automate log
|
2004-05-16 21:34:46 +02:00
|
|
|
rotation, the <application>logrotate</application> program can be
|
2003-11-01 02:56:29 +01:00
|
|
|
configured to work with log files from
|
|
|
|
<application>syslog</application>.
|
2004-03-15 15:15:45 +01:00
|
|
|
</para>
|
2003-10-09 21:05:09 +02:00
|
|
|
|
2004-03-15 15:15:45 +01:00
|
|
|
<para>
|
|
|
|
On many systems, however, <application>syslog</> is not very reliable,
|
|
|
|
particularly with large log messages; it may truncate or drop messages
|
2004-12-27 23:30:10 +01:00
|
|
|
just when you need them the most. Also, on <productname>Linux</>,
|
2004-03-15 15:21:30 +01:00
|
|
|
<application>syslog</> will sync each message to disk, yielding poor
|
2004-08-06 01:32:13 +02:00
|
|
|
performance. (You can use a <literal>-</> at the start of the file name
|
2004-12-13 19:05:10 +01:00
|
|
|
in the <application>syslog</> configuration file to disable this behavior.)
|
2001-11-20 05:27:49 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2004-08-06 01:32:13 +02:00
|
|
|
Note that all the solutions described above take care of starting new
|
|
|
|
log files at configurable intervals, but they do not handle deletion
|
2004-12-27 23:30:10 +01:00
|
|
|
of old, no-longer-interesting log files. You will probably want to set
|
|
|
|
up a batch job to periodically delete old log files. Another possibility
|
|
|
|
is to configure the rotation program so that old log files are overwritten
|
|
|
|
cyclically.
|
2001-11-20 05:27:49 +01:00
|
|
|
</para>
|
|
|
|
</sect1>
|
2001-08-26 23:17:12 +02:00
|
|
|
</chapter>
|
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode:sgml
|
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|