559 lines
23 KiB
Plaintext
559 lines
23 KiB
Plaintext
<!-- doc/src/sgml/amcheck.sgml -->
|
|
|
|
<sect1 id="amcheck" xreflabel="amcheck">
|
|
<title>amcheck — tools to verify table and index consistency</title>
|
|
|
|
<indexterm zone="amcheck">
|
|
<primary>amcheck</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The <filename>amcheck</filename> module provides functions that allow you to
|
|
verify the logical consistency of the structure of relations.
|
|
</para>
|
|
|
|
<para>
|
|
The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
|
|
structure of the representation of particular relations. The
|
|
correctness of the access method functions behind index scans and
|
|
other important operations relies on these invariants always
|
|
holding. For example, certain functions verify, among other things,
|
|
that all B-Tree pages have items in <quote>logical</quote> order (e.g.,
|
|
for B-Tree indexes on <type>text</type>, index tuples should be in
|
|
collated lexical order). If that particular invariant somehow fails
|
|
to hold, we can expect binary searches on the affected page to
|
|
incorrectly guide index scans, resulting in wrong answers to SQL
|
|
queries. If the structure appears to be valid, no error is raised.
|
|
</para>
|
|
<para>
|
|
Verification is performed using the same procedures as those used by
|
|
index scans themselves, which may be user-defined operator class
|
|
code. For example, B-Tree index verification relies on comparisons
|
|
made with one or more B-Tree support function 1 routines. See <xref
|
|
linkend="xindex-support"/> for details of operator class support
|
|
functions.
|
|
</para>
|
|
<para>
|
|
Unlike the B-Tree checking functions which report corruption by raising
|
|
errors, the heap checking function <function>verify_heapam</function> checks
|
|
a table and attempts to return a set of rows, one row per corruption
|
|
detected. Despite this, if facilities that
|
|
<function>verify_heapam</function> relies upon are themselves corrupted, the
|
|
function may be unable to continue and may instead raise an error.
|
|
</para>
|
|
<para>
|
|
Permission to execute <filename>amcheck</filename> functions may be granted
|
|
to non-superusers, but before granting such permissions careful consideration
|
|
should be given to data security and privacy concerns. Although the
|
|
corruption reports generated by these functions do not focus on the contents
|
|
of the corrupted data so much as on the structure of that data and the nature
|
|
of the corruptions found, an attacker who gains permission to execute these
|
|
functions, particularly if the attacker can also induce corruption, might be
|
|
able to infer something of the data itself from such messages.
|
|
</para>
|
|
|
|
<sect2 id="amcheck-functions">
|
|
<title>Functions</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>
|
|
<function>bt_index_check(index regclass, heapallindexed boolean) returns void</function>
|
|
<indexterm>
|
|
<primary>bt_index_check</primary>
|
|
</indexterm>
|
|
</term>
|
|
|
|
<listitem>
|
|
<para>
|
|
<function>bt_index_check</function> tests that its target, a
|
|
B-Tree index, respects a variety of invariants. Example usage:
|
|
<screen>
|
|
test=# SELECT bt_index_check(index => c.oid, heapallindexed => i.indisunique),
|
|
c.relname,
|
|
c.relpages
|
|
FROM pg_index i
|
|
JOIN pg_opclass op ON i.indclass[0] = op.oid
|
|
JOIN pg_am am ON op.opcmethod = am.oid
|
|
JOIN pg_class c ON i.indexrelid = c.oid
|
|
JOIN pg_namespace n ON c.relnamespace = n.oid
|
|
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
|
|
-- Don't check temp tables, which may be from another session:
|
|
AND c.relpersistence != 't'
|
|
-- Function may throw an error when this is omitted:
|
|
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
|
|
ORDER BY c.relpages DESC LIMIT 10;
|
|
bt_index_check | relname | relpages
|
|
----------------+---------------------------------+----------
|
|
| pg_depend_reference_index | 43
|
|
| pg_depend_depender_index | 40
|
|
| pg_proc_proname_args_nsp_index | 31
|
|
| pg_description_o_c_o_index | 21
|
|
| pg_attribute_relid_attnam_index | 14
|
|
| pg_proc_oid_index | 10
|
|
| pg_attribute_relid_attnum_index | 9
|
|
| pg_amproc_fam_proc_index | 5
|
|
| pg_amop_opr_fam_index | 5
|
|
| pg_amop_fam_strat_index | 5
|
|
(10 rows)
|
|
</screen>
|
|
This example shows a session that performs verification of the
|
|
10 largest catalog indexes in the database <quote>test</quote>.
|
|
Verification of the presence of heap tuples as index tuples is
|
|
requested for the subset that are unique indexes. Since no
|
|
error is raised, all indexes tested appear to be logically
|
|
consistent. Naturally, this query could easily be changed to
|
|
call <function>bt_index_check</function> for every index in the
|
|
database where verification is supported.
|
|
</para>
|
|
<para>
|
|
<function>bt_index_check</function> acquires an <literal>AccessShareLock</literal>
|
|
on the target index and the heap relation it belongs to. This lock mode
|
|
is the same lock mode acquired on relations by simple
|
|
<literal>SELECT</literal> statements.
|
|
<function>bt_index_check</function> does not verify invariants
|
|
that span child/parent relationships, but will verify the
|
|
presence of all heap tuples as index tuples within the index
|
|
when <parameter>heapallindexed</parameter> is
|
|
<literal>true</literal>. When a routine, lightweight test for
|
|
corruption is required in a live production environment, using
|
|
<function>bt_index_check</function> often provides the best
|
|
trade-off between thoroughness of verification and limiting the
|
|
impact on application performance and availability.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>
|
|
<function>bt_index_parent_check(index regclass, heapallindexed boolean, rootdescend boolean) returns void</function>
|
|
<indexterm>
|
|
<primary>bt_index_parent_check</primary>
|
|
</indexterm>
|
|
</term>
|
|
|
|
<listitem>
|
|
<para>
|
|
<function>bt_index_parent_check</function> tests that its
|
|
target, a B-Tree index, respects a variety of invariants.
|
|
Optionally, when the <parameter>heapallindexed</parameter>
|
|
argument is <literal>true</literal>, the function verifies the
|
|
presence of all heap tuples that should be found within the
|
|
index. When the optional <parameter>rootdescend</parameter>
|
|
argument is <literal>true</literal>, verification re-finds
|
|
tuples on the leaf level by performing a new search from the
|
|
root page for each tuple. The checks that can be performed by
|
|
<function>bt_index_parent_check</function> are a superset of the
|
|
checks that can be performed by <function>bt_index_check</function>.
|
|
<function>bt_index_parent_check</function> can be thought of as
|
|
a more thorough variant of <function>bt_index_check</function>:
|
|
unlike <function>bt_index_check</function>,
|
|
<function>bt_index_parent_check</function> also checks
|
|
invariants that span parent/child relationships, including checking
|
|
that there are no missing downlinks in the index structure.
|
|
<function>bt_index_parent_check</function> follows the general
|
|
convention of raising an error if it finds a logical
|
|
inconsistency or other problem.
|
|
</para>
|
|
<para>
|
|
A <literal>ShareLock</literal> is required on the target index by
|
|
<function>bt_index_parent_check</function> (a
|
|
<literal>ShareLock</literal> is also acquired on the heap relation).
|
|
These locks prevent concurrent data modification from
|
|
<command>INSERT</command>, <command>UPDATE</command>, and <command>DELETE</command>
|
|
commands. The locks also prevent the underlying relation from
|
|
being concurrently processed by <command>VACUUM</command>, as well as
|
|
all other utility commands. Note that the function holds locks
|
|
only while running, not for the entire transaction.
|
|
</para>
|
|
<para>
|
|
<function>bt_index_parent_check</function>'s additional
|
|
verification is more likely to detect various pathological
|
|
cases. These cases may involve an incorrectly implemented
|
|
B-Tree operator class used by the index that is checked, or,
|
|
hypothetically, undiscovered bugs in the underlying B-Tree index
|
|
access method code. Note that
|
|
<function>bt_index_parent_check</function> cannot be used when
|
|
hot standby mode is enabled (i.e., on read-only physical
|
|
replicas), unlike <function>bt_index_check</function>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<tip>
|
|
<para>
|
|
<function>bt_index_check</function> and
|
|
<function>bt_index_parent_check</function> both output log
|
|
messages about the verification process at
|
|
<literal>DEBUG1</literal> and <literal>DEBUG2</literal> severity
|
|
levels. These messages provide detailed information about the
|
|
verification process that may be of interest to
|
|
<productname>PostgreSQL</productname> developers. Advanced users
|
|
may also find this information helpful, since it provides
|
|
additional context should verification actually detect an
|
|
inconsistency. Running:
|
|
<programlisting>
|
|
SET client_min_messages = DEBUG1;
|
|
</programlisting>
|
|
in an interactive <application>psql</application> session before
|
|
running a verification query will display messages about the
|
|
progress of verification with a manageable level of detail.
|
|
</para>
|
|
</tip>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>
|
|
<function>
|
|
verify_heapam(relation regclass,
|
|
on_error_stop boolean,
|
|
check_toast boolean,
|
|
skip text,
|
|
startblock bigint,
|
|
endblock bigint,
|
|
blkno OUT bigint,
|
|
offnum OUT integer,
|
|
attnum OUT integer,
|
|
msg OUT text)
|
|
returns setof record
|
|
</function>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
Checks a table, sequence, or materialized view for structural corruption,
|
|
where pages in the relation contain data that is invalidly formatted, and
|
|
for logical corruption, where pages are structurally valid but
|
|
inconsistent with the rest of the database cluster.
|
|
</para>
|
|
<para>
|
|
The following optional arguments are recognized:
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><literal>on_error_stop</literal></term>
|
|
<listitem>
|
|
<para>
|
|
If true, corruption checking stops at the end of the first block in
|
|
which any corruptions are found.
|
|
</para>
|
|
<para>
|
|
Defaults to false.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>check_toast</literal></term>
|
|
<listitem>
|
|
<para>
|
|
If true, toasted values are checked against the target relation's
|
|
TOAST table.
|
|
</para>
|
|
<para>
|
|
This option is known to be slow. Also, if the toast table or its
|
|
index is corrupt, checking it against toast values could conceivably
|
|
crash the server, although in many cases this would just produce an
|
|
error.
|
|
</para>
|
|
<para>
|
|
Defaults to false.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>skip</literal></term>
|
|
<listitem>
|
|
<para>
|
|
If not <literal>none</literal>, corruption checking skips blocks that
|
|
are marked as all-visible or all-frozen, as specified.
|
|
Valid options are <literal>all-visible</literal>,
|
|
<literal>all-frozen</literal> and <literal>none</literal>.
|
|
</para>
|
|
<para>
|
|
Defaults to <literal>none</literal>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>startblock</literal></term>
|
|
<listitem>
|
|
<para>
|
|
If specified, corruption checking begins at the specified block,
|
|
skipping all previous blocks. It is an error to specify a
|
|
<parameter>startblock</parameter> outside the range of blocks in the
|
|
target table.
|
|
</para>
|
|
<para>
|
|
By default, checking begins at the first block.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>endblock</literal></term>
|
|
<listitem>
|
|
<para>
|
|
If specified, corruption checking ends at the specified block,
|
|
skipping all remaining blocks. It is an error to specify an
|
|
<parameter>endblock</parameter> outside the range of blocks in the target
|
|
table.
|
|
</para>
|
|
<para>
|
|
By default, all blocks are checked.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
For each corruption detected, <function>verify_heapam</function> returns
|
|
a row with the following columns:
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><literal>blkno</literal></term>
|
|
<listitem>
|
|
<para>
|
|
The number of the block containing the corrupt page.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>offnum</literal></term>
|
|
<listitem>
|
|
<para>
|
|
The OffsetNumber of the corrupt tuple.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>attnum</literal></term>
|
|
<listitem>
|
|
<para>
|
|
The attribute number of the corrupt column in the tuple, if the
|
|
corruption is specific to a column and not the tuple as a whole.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>msg</literal></term>
|
|
<listitem>
|
|
<para>
|
|
A message describing the problem detected.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2 id="amcheck-optional-heapallindexed-verification">
|
|
<title>Optional <parameter>heapallindexed</parameter> Verification</title>
|
|
<para>
|
|
When the <parameter>heapallindexed</parameter> argument to B-Tree
|
|
verification functions is <literal>true</literal>, an additional
|
|
phase of verification is performed against the table associated with
|
|
the target index relation. This consists of a <quote>dummy</quote>
|
|
<command>CREATE INDEX</command> operation, which checks for the
|
|
presence of all hypothetical new index tuples against a temporary,
|
|
in-memory summarizing structure (this is built when needed during
|
|
the basic first phase of verification). The summarizing structure
|
|
<quote>fingerprints</quote> every tuple found within the target
|
|
index. The high level principle behind
|
|
<parameter>heapallindexed</parameter> verification is that a new
|
|
index that is equivalent to the existing, target index must only
|
|
have entries that can be found in the existing structure.
|
|
</para>
|
|
<para>
|
|
The additional <parameter>heapallindexed</parameter> phase adds
|
|
significant overhead: verification will typically take several times
|
|
longer. However, there is no change to the relation-level locks
|
|
acquired when <parameter>heapallindexed</parameter> verification is
|
|
performed.
|
|
</para>
|
|
<para>
|
|
The summarizing structure is bound in size by
|
|
<varname>maintenance_work_mem</varname>. In order to ensure that
|
|
there is no more than a 2% probability of failure to detect an
|
|
inconsistency for each heap tuple that should be represented in the
|
|
index, approximately 2 bytes of memory are needed per tuple. As
|
|
less memory is made available per tuple, the probability of missing
|
|
an inconsistency slowly increases. This approach limits the
|
|
overhead of verification significantly, while only slightly reducing
|
|
the probability of detecting a problem, especially for installations
|
|
where verification is treated as a routine maintenance task. Any
|
|
single absent or malformed tuple has a new opportunity to be
|
|
detected with each new verification attempt.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="amcheck-using-amcheck-effectively">
|
|
<title>Using <filename>amcheck</filename> Effectively</title>
|
|
|
|
<para>
|
|
<filename>amcheck</filename> can be effective at detecting various types of
|
|
failure modes that <link
|
|
linkend="app-initdb-data-checksums"><application>data
|
|
checksums</application></link> will fail to catch. These include:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Structural inconsistencies caused by incorrect operator class
|
|
implementations.
|
|
</para>
|
|
<para>
|
|
This includes issues caused by the comparison rules of operating
|
|
system collations changing. Comparisons of datums of a collatable
|
|
type like <type>text</type> must be immutable (just as all
|
|
comparisons used for B-Tree index scans must be immutable), which
|
|
implies that operating system collation rules must never change.
|
|
Though rare, updates to operating system collation rules can
|
|
cause these issues. More commonly, an inconsistency in the
|
|
collation order between a primary server and a standby server is
|
|
implicated, possibly because the <emphasis>major</emphasis> operating
|
|
system version in use is inconsistent. Such inconsistencies will
|
|
generally only arise on standby servers, and so can generally
|
|
only be detected on standby servers.
|
|
</para>
|
|
<para>
|
|
If a problem like this arises, it may not affect each individual
|
|
index that is ordered using an affected collation, simply because
|
|
<emphasis>indexed</emphasis> values might happen to have the same
|
|
absolute ordering regardless of the behavioral inconsistency. See
|
|
<xref linkend="locale"/> and <xref linkend="collation"/> for
|
|
further details about how <productname>PostgreSQL</productname> uses
|
|
operating system locales and collations.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Structural inconsistencies between indexes and the heap relations
|
|
that are indexed (when <parameter>heapallindexed</parameter>
|
|
verification is performed).
|
|
</para>
|
|
<para>
|
|
There is no cross-checking of indexes against their heap relation
|
|
during normal operation. Symptoms of heap corruption can be subtle.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Corruption caused by hypothetical undiscovered bugs in the
|
|
underlying <productname>PostgreSQL</productname> access method
|
|
code, sort code, or transaction management code.
|
|
</para>
|
|
<para>
|
|
Automatic verification of the structural integrity of indexes
|
|
plays a role in the general testing of new or proposed
|
|
<productname>PostgreSQL</productname> features that could plausibly allow a
|
|
logical inconsistency to be introduced. Verification of table
|
|
structure and associated visibility and transaction status
|
|
information plays a similar role. One obvious testing strategy
|
|
is to call <filename>amcheck</filename> functions continuously
|
|
when running the standard regression tests. See <xref
|
|
linkend="regress-run"/> for details on running the tests.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
File system or storage subsystem faults where checksums happen to
|
|
simply not be enabled.
|
|
</para>
|
|
<para>
|
|
Note that <filename>amcheck</filename> examines a page as represented in some
|
|
shared memory buffer at the time of verification if there is only a
|
|
shared buffer hit when accessing the block. Consequently,
|
|
<filename>amcheck</filename> does not necessarily examine data read from the
|
|
file system at the time of verification. Note that when checksums are
|
|
enabled, <filename>amcheck</filename> may raise an error due to a checksum
|
|
failure when a corrupt block is read into a buffer.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Corruption caused by faulty RAM, or the broader memory subsystem.
|
|
</para>
|
|
<para>
|
|
<productname>PostgreSQL</productname> does not protect against correctable
|
|
memory errors and it is assumed you will operate using RAM that
|
|
uses industry standard Error Correcting Codes (ECC) or better
|
|
protection. However, ECC memory is typically only immune to
|
|
single-bit errors, and should not be assumed to provide
|
|
<emphasis>absolute</emphasis> protection against failures that
|
|
result in memory corruption.
|
|
</para>
|
|
<para>
|
|
When <parameter>heapallindexed</parameter> verification is
|
|
performed, there is generally a greatly increased chance of
|
|
detecting single-bit errors, since strict binary equality is
|
|
tested, and the indexed attributes within the heap are tested.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
Structural corruption can happen due to faulty storage hardware, or
|
|
relation files being overwritten or modified by unrelated software.
|
|
This kind of corruption can also be detected with
|
|
<link linkend="checksums"><application>data page
|
|
checksums</application></link>.
|
|
</para>
|
|
|
|
<para>
|
|
Relation pages which are correctly formatted, internally consistent, and
|
|
correct relative to their own internal checksums may still contain
|
|
logical corruption. As such, this kind of corruption cannot be detected
|
|
with <application>checksums</application>. Examples include toasted
|
|
values in the main table which lack a corresponding entry in the toast
|
|
table, and tuples in the main table with a Transaction ID that is older
|
|
than the oldest valid Transaction ID in the database or cluster.
|
|
</para>
|
|
|
|
<para>
|
|
Multiple causes of logical corruption have been observed in production
|
|
systems, including bugs in the <productname>PostgreSQL</productname>
|
|
server software, faulty and ill-conceived backup and restore tools, and
|
|
user error.
|
|
</para>
|
|
|
|
<para>
|
|
Corrupt relations are most concerning in live production environments,
|
|
precisely the same environments where high risk activities are least
|
|
welcome. For this reason, <function>verify_heapam</function> has been
|
|
designed to diagnose corruption without undue risk. It cannot guard
|
|
against all causes of backend crashes, as even executing the calling
|
|
query could be unsafe on a badly corrupted system. Access to <link
|
|
linkend="catalogs-overview">catalog tables</link> is performed and could
|
|
be problematic if the catalogs themselves are corrupted.
|
|
</para>
|
|
|
|
<para>
|
|
In general, <filename>amcheck</filename> can only prove the presence of
|
|
corruption; it cannot prove its absence.
|
|
</para>
|
|
|
|
</sect2>
|
|
<sect2 id="amcheck-repairing-corruption">
|
|
<title>Repairing Corruption</title>
|
|
<para>
|
|
No error concerning corruption raised by <filename>amcheck</filename> should
|
|
ever be a false positive. <filename>amcheck</filename> raises
|
|
errors in the event of conditions that, by definition, should never
|
|
happen, and so careful analysis of <filename>amcheck</filename>
|
|
errors is often required.
|
|
</para>
|
|
<para>
|
|
There is no general method of repairing problems that
|
|
<filename>amcheck</filename> detects. An explanation for the root cause of
|
|
an invariant violation should be sought. <xref
|
|
linkend="pageinspect"/> may play a useful role in diagnosing
|
|
corruption that <filename>amcheck</filename> detects. A <command>REINDEX</command>
|
|
may not be effective in repairing corruption.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|