postgresql/doc/src/sgml/amcheck.sgml

<!-- doc/src/sgml/amcheck.sgml -->

<sect1 id="amcheck" xreflabel="amcheck">
 <title>amcheck</title>

 <indexterm zone="amcheck">
  <primary>amcheck</primary>
 </indexterm>

 <para>
  The <filename>amcheck</filename> module provides functions that allow you to
  verify the logical consistency of the structure of relations.
 </para>

 <para>
  The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
  structure of the representation of particular relations.  The
  correctness of the access method functions behind index scans and
  other important operations relies on these invariants always
  holding.  For example, certain functions verify, among other things,
  that all B-Tree pages have items in <quote>logical</quote> order (e.g.,
  for B-Tree indexes on <type>text</type>, index tuples should be in
  collated lexical order).  If that particular invariant somehow fails
  to hold, we can expect binary searches on the affected page to
  incorrectly guide index scans, resulting in wrong answers to SQL
  queries.  If the structure appears to be valid, no error is raised.
 </para>
 <para>
  Verification is performed using the same procedures as those used by
  index scans themselves, which may be user-defined operator class
  code.  For example, B-Tree index verification relies on comparisons
  made with one or more B-Tree support function 1 routines.  See <xref
  linkend="xindex-support"/> for details of operator class support
  functions.
 </para>
 <para>
  Unlike the B-Tree checking functions which report corruption by raising
  errors, the heap checking function <function>verify_heapam</function> checks
  a table and attempts to return a set of rows, one row per corruption
  detected.  Despite this, if facilities that
  <function>verify_heapam</function> relies upon are themselves corrupted, the
  function may be unable to continue and may instead raise an error.
 </para>
 <para>
  Permission to execute <filename>amcheck</filename> functions may be granted
  to non-superusers, but before granting such permissions careful consideration
  should be given to data security and privacy concerns.  Although the
  corruption reports generated by these functions do not focus on the contents
  of the corrupted data so much as on the structure of that data and the nature
  of the corruptions found, an attacker who gains permission to execute these
  functions, particularly if the attacker can also induce corruption, might be
  able to infer something of the data itself from such messages.
 </para>

 <sect2>
  <title>Functions</title>

  <variablelist>
   <varlistentry>
    <term>
     <function>bt_index_check(index regclass, heapallindexed boolean) returns void</function>
     <indexterm>
      <primary>bt_index_check</primary>
     </indexterm>
    </term>

    <listitem>
     <para>
      <function>bt_index_check</function> tests that its target, a
      B-Tree index, respects a variety of invariants.  Example usage:
<screen>
test=# SELECT bt_index_check(index =&gt; c.oid, heapallindexed =&gt; i.indisunique),
               c.relname,
               c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
 bt_index_check |             relname             | relpages
----------------+---------------------------------+----------
                | pg_depend_reference_index       |       43
                | pg_depend_depender_index        |       40
                | pg_proc_proname_args_nsp_index  |       31
                | pg_description_o_c_o_index      |       21
                | pg_attribute_relid_attnam_index |       14
                | pg_proc_oid_index               |       10
                | pg_attribute_relid_attnum_index |        9
                | pg_amproc_fam_proc_index        |        5
                | pg_amop_opr_fam_index           |        5
                | pg_amop_fam_strat_index         |        5
(10 rows)
</screen>
      This example shows a session that performs verification of the
      10 largest catalog indexes in the database <quote>test</quote>.
      Verification of the presence of heap tuples as index tuples is
      requested for the subset that are unique indexes.  Since no
      error is raised, all indexes tested appear to be logically
      consistent.  Naturally, this query could easily be changed to
      call <function>bt_index_check</function> for every index in the
      database where verification is supported.
     </para>
     <para>
      <function>bt_index_check</function> acquires an <literal>AccessShareLock</literal>
      on the target index and the heap relation it belongs to. This lock mode
      is the same lock mode acquired on relations by simple
      <literal>SELECT</literal> statements.
      <function>bt_index_check</function> does not verify invariants
      that span child/parent relationships, but will verify the
      presence of all heap tuples as index tuples within the index
      when <parameter>heapallindexed</parameter> is
      <literal>true</literal>.  When a routine, lightweight test for
      corruption is required in a live production environment, using
      <function>bt_index_check</function> often provides the best
      trade-off between thoroughness of verification and limiting the
      impact on application performance and availability.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term>
     <function>bt_index_parent_check(index regclass, heapallindexed boolean, rootdescend boolean) returns void</function>
     <indexterm>
      <primary>bt_index_parent_check</primary>
     </indexterm>
    </term>

    <listitem>
     <para>
      <function>bt_index_parent_check</function> tests that its
      target, a B-Tree index, respects a variety of invariants.
      Optionally, when the <parameter>heapallindexed</parameter>
      argument is <literal>true</literal>, the function verifies the
      presence of all heap tuples that should be found within the
      index.  When the optional <parameter>rootdescend</parameter>
      argument is <literal>true</literal>, verification re-finds
      tuples on the leaf level by performing a new search from the
      root page for each tuple.  The checks that can be performed by
      <function>bt_index_parent_check</function> are a superset of the
      checks that can be performed by <function>bt_index_check</function>.
      <function>bt_index_parent_check</function> can be thought of as
      a more thorough variant of <function>bt_index_check</function>:
      unlike <function>bt_index_check</function>,
      <function>bt_index_parent_check</function> also checks
      invariants that span parent/child relationships, including checking
      that there are no missing downlinks in the index structure.
      <function>bt_index_parent_check</function> follows the general
      convention of raising an error if it finds a logical
      inconsistency or other problem.
     </para>
     <para>
      A <literal>ShareLock</literal> is required on the target index by
      <function>bt_index_parent_check</function> (a
      <literal>ShareLock</literal> is also acquired on the heap relation).
      These locks prevent concurrent data modification from
      <command>INSERT</command>, <command>UPDATE</command>, and <command>DELETE</command>
      commands.  The locks also prevent the underlying relation from
      being concurrently processed by <command>VACUUM</command>, as well as
      all other utility commands.  Note that the function holds locks
      only while running, not for the entire transaction.
     </para>
     <para>
      <function>bt_index_parent_check</function>'s additional
      verification is more likely to detect various pathological
      cases.  These cases may involve an incorrectly implemented
      B-Tree operator class used by the index that is checked, or,
      hypothetically, undiscovered bugs in the underlying B-Tree index
      access method code.  Note that
      <function>bt_index_parent_check</function> cannot be used when
      Hot Standby mode is enabled (i.e., on read-only physical
      replicas), unlike <function>bt_index_check</function>.
     </para>
    </listitem>
   </varlistentry>
  </variablelist>
  <tip>
   <para>
    <function>bt_index_check</function> and
    <function>bt_index_parent_check</function> both output log
    messages about the verification process at
    <literal>DEBUG1</literal> and <literal>DEBUG2</literal> severity
    levels.  These messages provide detailed information about the
    verification process that may be of interest to
    <productname>PostgreSQL</productname> developers.  Advanced users
    may also find this information helpful, since it provides
    additional context should verification actually detect an
    inconsistency.  Running:
<programlisting>
SET client_min_messages = DEBUG1;
</programlisting>
    in an interactive <application>psql</application> session before
    running a verification query will display messages about the
    progress of verification with a manageable level of detail.
   </para>
  </tip>

  <variablelist>
   <varlistentry>
    <term>
     <function>
      verify_heapam(relation regclass,
                    on_error_stop boolean,
                    check_toast boolean,
                    skip cstring,
                    startblock bigint,
                    endblock bigint,
                    blkno OUT bigint,
                    offnum OUT integer,
                    attnum OUT integer,
                    msg OUT text)
      returns record
     </function>
    </term>
    <listitem>
     <para>
      Checks a table for structural corruption, where pages in the relation
      contain data that is invalidly formatted, and for logical corruption,
      where pages are structurally valid but inconsistent with the rest of the
      database cluster.  Example usage:
<screen>
test=# select * from verify_heapam('mytable', check_toast := true);
 blkno | offnum | attnum |                                                msg
-------+--------+--------+--------------------------------------------------------------------------------------------------
    17 |     12 |        | xmin 4294967295 precedes relation freeze threshold 17:1134217582
   960 |      4 |        | data begins at offset 152 beyond the tuple length 58
   960 |      4 |        | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
   960 |      5 |        | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
   960 |      6 |        | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
   960 |      7 |        | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
  1147 |      2 |        | number of attributes 2047 exceeds maximum expected for table 3
  1147 |     10 |        | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
  1147 |     15 |        | number of attributes 67 exceeds maximum expected for table 3
  1147 |     16 |      1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
  1147 |     18 |      2 | final toast chunk number 0 differs from expected value 6
  1147 |     19 |      2 | toasted value for attribute 2 missing from toast table
  1147 |     21 |        | tuple is marked as only locked, but also claims key columns were updated
  1147 |     22 |        | multitransaction ID 1775655 is from before relation cutoff 2355572
(14 rows)
</screen>
      As this example shows, the Tuple ID (TID) of the corrupt tuple is given
      in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
      for corruptions specific to a particular attribute in the tuple, the
      <literal>attnum</literal> field shows which one.
     </para>
     <para>
      Structural corruption can happen due to faulty storage hardware, or
      relation files being overwritten or modified by unrelated software.
      This kind of corruption can also be detected with
      <link linkend="app-initdb-data-checksums"><application>data page
      checksums</application></link>.
     </para>
     <para>
      Relation pages which are correctly formatted, internally consistent, and
      correct relative to their own internal checksums may still contain
      logical corruption.  As such, this kind of corruption cannot be detected
      with <application>checksums</application>.  Examples include toasted
      values in the main table which lack a corresponding entry in the toast
      table, and tuples in the main table with a Transaction ID that is older
      than the oldest valid Transaction ID in the database or cluster.
     </para>
     <para>
      Multiple causes of logical corruption have been observed in production
      systems, including bugs in the <productname>PostgreSQL</productname>
      server software, faulty and ill-conceived backup and restore tools, and
      user error.
     </para>
     <para>
      Corrupt relations are most concerning in live production environments,
      precisely the same environments where high risk activities are least
      welcome.  For this reason, <function>verify_heapam</function> has been
      designed to diagnose corruption without undue risk.  It cannot guard
      against all causes of backend crashes, as even executing the calling
      query could be unsafe on a badly corrupted system.   Access to <link
      linkend="catalogs-overview">catalog tables</link> are performed and could
      be problematic if the catalogs themselves are corrupted.
     </para>
     <para>
      The design principle adhered to in <function>verify_heapam</function> is
      that, if the rest of the system and server hardware are correct, under
      default options, <function>verify_heapam</function> will not crash the
      server due merely to structural or logical corruption in the target
      table.
     </para>
     <para>
      The <literal>check_toast</literal> attempts to reconcile the target
      table against entries in its corresponding toast table.  This option is
      disabled by default and is known to be slow.
      If the target relation's corresponding toast table or toast index is
      corrupt, reconciling the target table against toast values could
      conceivably crash the server, although in many cases this would
      just produce an error.
     </para>
     <para>
      The following optional arguments are recognized:
     </para>
     <variablelist>
      <varlistentry>
       <term>on_error_stop</term>
       <listitem>
        <para>
         If true, corruption checking stops at the end of the first block on
         which any corruptions are found.
        </para>
        <para>
         Defaults to false.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
       <term>check_toast</term>
       <listitem>
        <para>
         If true, toasted values are checked gainst the corresponding
         TOAST table.
        </para>
        <para>
         Defaults to false.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
       <term>skip</term>
       <listitem>
        <para>
         If not <literal>none</literal>, corruption checking skips blocks that
         are marked as all-visible or all-frozen, as given.
         Valid options are <literal>all-visible</literal>,
         <literal>all-frozen</literal> and <literal>none</literal>.
        </para>
        <para>
         Defaults to <literal>none</literal>.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
       <term>startblock</term>
       <listitem>
        <para>
         If specified, corruption checking begins at the specified block,
         skipping all previous blocks.  It is an error to specify a
         <literal>startblock</literal> outside the range of blocks in the
         target table.
        </para>
        <para>
         By default, does not skip any blocks.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
       <term>endblock</term>
       <listitem>
        <para>
         If specified, corruption checking ends at the specified block,
         skipping all remaining blocks.  It is an error to specify an
         <literal>endblock</literal> outside the range of blocks in the target
         table.
        </para>
        <para>
         By default, does not skip any blocks.
        </para>
       </listitem>
      </varlistentry>
     </variablelist>
     <para>
      For each corruption detected, <function>verify_heapam</function> returns
      a row with the following columns:
     </para>
     <variablelist>
      <varlistentry>
       <term>blkno</term>
       <listitem>
        <para>
         The number of the block containing the corrupt page.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
       <term>offnum</term>
       <listitem>
        <para>
         The OffsetNumber of the corrupt tuple.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
       <term>attnum</term>
       <listitem>
        <para>
         The attribute number of the corrupt column in the tuple, if the
         corruption is specific to a column and not the tuple as a whole.
        </para>
       </listitem>
      </varlistentry>
      <varlistentry>
       <term>msg</term>
       <listitem>
        <para>
         A human readable message describing the corruption in the page.
        </para>
       </listitem>
      </varlistentry>
     </variablelist>
    </listitem>
   </varlistentry>
  </variablelist>
 </sect2>

 <sect2>
  <title>Optional <parameter>heapallindexed</parameter> Verification</title>
 <para>
  When the <parameter>heapallindexed</parameter> argument to B-Tree
  verification functions is <literal>true</literal>, an additional
  phase of verification is performed against the table associated with
  the target index relation.  This consists of a <quote>dummy</quote>
  <command>CREATE INDEX</command> operation, which checks for the
  presence of all hypothetical new index tuples against a temporary,
  in-memory summarizing structure (this is built when needed during
  the basic first phase of verification).  The summarizing structure
  <quote>fingerprints</quote> every tuple found within the target
  index.  The high level principle behind
  <parameter>heapallindexed</parameter> verification is that a new
  index that is equivalent to the existing, target index must only
  have entries that can be found in the existing structure.
 </para>
 <para>
  The additional <parameter>heapallindexed</parameter> phase adds
  significant overhead: verification will typically take several times
  longer.  However, there is no change to the relation-level locks
  acquired when <parameter>heapallindexed</parameter> verification is
  performed.
 </para>
 <para>
  The summarizing structure is bound in size by
  <varname>maintenance_work_mem</varname>.  In order to ensure that
  there is no more than a 2% probability of failure to detect an
  inconsistency for each heap tuple that should be represented in the
  index, approximately 2 bytes of memory are needed per tuple.  As
  less memory is made available per tuple, the probability of missing
  an inconsistency slowly increases.  This approach limits the
  overhead of verification significantly, while only slightly reducing
  the probability of detecting a problem, especially for installations
  where verification is treated as a routine maintenance task.  Any
  single absent or malformed tuple has a new opportunity to be
  detected with each new verification attempt.
 </para>

 </sect2>

 <sect2>
  <title>Using <filename>amcheck</filename> Effectively</title>

 <para>
  <filename>amcheck</filename> can be effective at detecting various types of
  failure modes that <link
  linkend="app-initdb-data-checksums"><application>data page
  checksums</application></link> will always fail to catch.  These include:

  <itemizedlist>
   <listitem>
    <para>
     Structural inconsistencies caused by incorrect operator class
     implementations.
    </para>
    <para>
     This includes issues caused by the comparison rules of operating
     system collations changing. Comparisons of datums of a collatable
     type like <type>text</type> must be immutable (just as all
     comparisons used for B-Tree index scans must be immutable), which
     implies that operating system collation rules must never change.
     Though rare, updates to operating system collation rules can
     cause these issues. More commonly, an inconsistency in the
     collation order between a primary server and a standby server is
     implicated, possibly because the <emphasis>major</emphasis> operating
     system version in use is inconsistent.  Such inconsistencies will
     generally only arise on standby servers, and so can generally
     only be detected on standby servers.
    </para>
    <para>
     If a problem like this arises, it may not affect each individual
     index that is ordered using an affected collation, simply because
     <emphasis>indexed</emphasis> values might happen to have the same
     absolute ordering regardless of the behavioral inconsistency. See
     <xref linkend="locale"/> and <xref linkend="collation"/> for
     further details about how <productname>PostgreSQL</productname> uses
     operating system locales and collations.
    </para>
   </listitem>
   <listitem>
    <para>
     Structural inconsistencies between indexes and the heap relations
     that are indexed (when <parameter>heapallindexed</parameter>
     verification is performed).
    </para>
    <para>
     There is no cross-checking of indexes against their heap relation
     during normal operation.  Symptoms of heap corruption can be subtle.
    </para>
   </listitem>
   <listitem>
    <para>
     Corruption caused by hypothetical undiscovered bugs in the
     underlying <productname>PostgreSQL</productname> access method
     code, sort code, or transaction management code.
    </para>
    <para>
     Automatic verification of the structural integrity of indexes
     plays a role in the general testing of new or proposed
     <productname>PostgreSQL</productname> features that could plausibly allow a
     logical inconsistency to be introduced.  Verification of table
     structure and associated visibility and transaction status
     information plays a similar role.  One obvious testing strategy
     is to call <filename>amcheck</filename> functions continuously
     when running the standard regression tests.  See <xref
     linkend="regress-run"/> for details on running the tests.
    </para>
   </listitem>
   <listitem>
    <para>
     File system or storage subsystem faults where checksums happen to
     simply not be enabled.
    </para>
    <para>
     Note that <filename>amcheck</filename> examines a page as represented in some
     shared memory buffer at the time of verification if there is only a
     shared buffer hit when accessing the block. Consequently,
     <filename>amcheck</filename> does not necessarily examine data read from the
     file system at the time of verification. Note that when checksums are
     enabled, <filename>amcheck</filename> may raise an error due to a checksum
     failure when a corrupt block is read into a buffer.
    </para>
   </listitem>
   <listitem>
    <para>
     Corruption caused by faulty RAM, or the broader memory subsystem.
    </para>
    <para>
     <productname>PostgreSQL</productname> does not protect against correctable
     memory errors and it is assumed you will operate using RAM that
     uses industry standard Error Correcting Codes (ECC) or better
     protection.  However, ECC memory is typically only immune to
     single-bit errors, and should not be assumed to provide
     <emphasis>absolute</emphasis> protection against failures that
     result in memory corruption.
    </para>
    <para>
     When <parameter>heapallindexed</parameter> verification is
     performed, there is generally a greatly increased chance of
     detecting single-bit errors, since strict binary equality is
     tested, and the indexed attributes within the heap are tested.
    </para>
   </listitem>
  </itemizedlist>
  In general, <filename>amcheck</filename> can only prove the presence of
  corruption; it cannot prove its absence.
 </para>

 </sect2>
 <sect2>
  <title>Repairing Corruption</title>
 <para>
  No error concerning corruption raised by <filename>amcheck</filename> should
  ever be a false positive.  <filename>amcheck</filename> raises
  errors in the event of conditions that, by definition, should never
  happen, and so careful analysis of <filename>amcheck</filename>
  errors is often required.
 </para>
 <para>
  There is no general method of repairing problems that
  <filename>amcheck</filename> detects.  An explanation for the root cause of
  an invariant violation should be sought.  <xref
  linkend="pageinspect"/> may play a useful role in diagnosing
  corruption that <filename>amcheck</filename> detects.  A <command>REINDEX</command>
  may not be effective in repairing corruption.
 </para>

 </sect2>

</sect1>