Don't consider newly inserted tuples in nbtree VACUUM.

Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
This commit is contained in:
Peter Geoghegan 2021-03-10 16:26:58 -08:00
parent 9a4e4af420
commit 9663d12446
6 changed files with 60 additions and 167 deletions

View File

@ -8349,47 +8349,6 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
<varlistentry id="guc-vacuum-cleanup-index-scale-factor" xreflabel="vacuum_cleanup_index_scale_factor">
<term><varname>vacuum_cleanup_index_scale_factor</varname> (<type>floating point</type>)
<indexterm>
<primary><varname>vacuum_cleanup_index_scale_factor</varname></primary>
<secondary>configuration parameter</secondary>
</indexterm>
</term>
<listitem>
<para>
Specifies the fraction of the total number of heap tuples counted in
the previous statistics collection that can be inserted without
incurring an index scan at the <command>VACUUM</command> cleanup stage.
This setting currently applies to B-tree indexes only.
</para>
<para>
If no tuples were deleted from the heap, B-tree indexes are still
scanned at the <command>VACUUM</command> cleanup stage when at least one
of the following conditions is met: the index statistics are stale, or
the index contains deleted pages that can be recycled during cleanup.
Index statistics are considered to be stale if the number of newly
inserted tuples exceeds the <varname>vacuum_cleanup_index_scale_factor</varname>
fraction of the total number of heap tuples detected by the previous
statistics collection. The total number of heap tuples is stored in
the index meta-page. Note that the meta-page does not include this data
until <command>VACUUM</command> finds no dead tuples, so B-tree index
scan at the cleanup stage can only be skipped if the second and
subsequent <command>VACUUM</command> cycles detect no dead tuples.
</para>
<para>
The value can range from <literal>0</literal> to
<literal>10000000000</literal>.
When <varname>vacuum_cleanup_index_scale_factor</varname> is set to
<literal>0</literal>, index scans are never skipped during
<command>VACUUM</command> cleanup. The default value is <literal>0.1</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-bytea-output" xreflabel="bytea_output">
<term><varname>bytea_output</varname> (<type>enum</type>)
<indexterm>

View File

@ -434,20 +434,6 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</note>
</listitem>
</varlistentry>
<varlistentry id="index-reloption-vacuum-cleanup-index-scale-factor" xreflabel="vacuum_cleanup_index_scale_factor">
<term><literal>vacuum_cleanup_index_scale_factor</literal> (<type>floating point</type>)
<indexterm>
<primary><varname>vacuum_cleanup_index_scale_factor</varname></primary>
<secondary>storage parameter</secondary>
</indexterm>
</term>
<listitem>
<para>
Per-index value for <xref linkend="guc-vacuum-cleanup-index-scale-factor"/>.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>

View File

@ -168,6 +168,9 @@ _bt_getmeta(Relation rel, Buffer metabuf)
*
* This routine checks if provided cleanup-related information is matching
* to those written in the metapage. On mismatch, metapage is overwritten.
*
* Postgres 13 ignores btm_last_cleanup_num_heap_tuples value here
* following backbranch disabling of vacuum_cleanup_index_scale_factor.
*/
void
_bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
@ -176,22 +179,15 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
Buffer metabuf;
Page metapg;
BTMetaPageData *metad;
bool needsRewrite = false;
XLogRecPtr recptr;
/* read the metapage and check if it needs rewrite */
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
/* outdated version of metapage always needs rewrite */
if (metad->btm_version < BTREE_NOVAC_VERSION)
needsRewrite = true;
else if (metad->btm_oldest_btpo_xact != oldestBtpoXact ||
metad->btm_last_cleanup_num_heap_tuples != numHeapTuples)
needsRewrite = true;
if (!needsRewrite)
/* Don't miss chance to upgrade index/metapage when BTREE_MIN_VERSION */
if (metad->btm_version >= BTREE_NOVAC_VERSION &&
metad->btm_oldest_btpo_xact == oldestBtpoXact)
{
_bt_relbuf(rel, metabuf);
return;
@ -209,13 +205,14 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
/* update cleanup-related information */
metad->btm_oldest_btpo_xact = oldestBtpoXact;
metad->btm_last_cleanup_num_heap_tuples = numHeapTuples;
metad->btm_last_cleanup_num_heap_tuples = -1;
MarkBufferDirty(metabuf);
/* write wal record if needed */
if (RelationNeedsWAL(rel))
{
xl_btree_metadata md;
XLogRecPtr recptr;
XLogBeginInsert();
XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
@ -227,7 +224,7 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
md.fastroot = metad->btm_fastroot;
md.fastlevel = metad->btm_fastlevel;
md.oldest_btpo_xact = oldestBtpoXact;
md.last_cleanup_num_heap_tuples = numHeapTuples;
md.last_cleanup_num_heap_tuples = -1; /* Disabled */
md.allequalimage = metad->btm_allequalimage;
XLogRegisterBufData(0, (char *) &md, sizeof(xl_btree_metadata));
@ -238,6 +235,7 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
}
END_CRIT_SECTION();
_bt_relbuf(rel, metabuf);
}

View File

@ -794,6 +794,9 @@ _bt_parallel_advance_array_keys(IndexScanDesc scan)
* When we return false, VACUUM can even skip the cleanup-only call to
* btvacuumscan (i.e. there will be no btvacuumscan call for this index at
* all). Otherwise, a cleanup-only btvacuumscan call is required.
*
* Postgres 13 ignores btm_last_cleanup_num_heap_tuples value here following
* backbranch disabling of vacuum_cleanup_index_scale_factor.
*/
static bool
_bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
@ -801,60 +804,44 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
Buffer metabuf;
Page metapg;
BTMetaPageData *metad;
bool result = false;
uint32 btm_version;
TransactionId prev_btm_oldest_btpo_xact;
/*
* Copy details from metapage to local variables quickly.
*
* Note that we deliberately avoid using cached version of metapage here.
*/
metabuf = _bt_getbuf(info->index, BTREE_METAPAGE, BT_READ);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
btm_version = metad->btm_version;
if (metad->btm_version < BTREE_NOVAC_VERSION)
if (btm_version < BTREE_NOVAC_VERSION)
{
/*
* Do cleanup if metapage needs upgrade, because we don't have
* cleanup-related meta-information yet.
* Metapage needs to be dynamically upgraded to store fields that are
* only present when btm_version >= BTREE_NOVAC_VERSION
*/
result = true;
_bt_relbuf(info->index, metabuf);
return true;
}
else if (TransactionIdIsValid(metad->btm_oldest_btpo_xact) &&
TransactionIdPrecedes(metad->btm_oldest_btpo_xact,
RecentGlobalXmin))
prev_btm_oldest_btpo_xact = metad->btm_oldest_btpo_xact;
_bt_relbuf(info->index, metabuf);
if (TransactionIdIsValid(prev_btm_oldest_btpo_xact) &&
TransactionIdPrecedes(prev_btm_oldest_btpo_xact, RecentGlobalXmin))
{
/*
* If any oldest btpo.xact from a previously deleted page in the index
* is older than RecentGlobalXmin, then at least one deleted page can
* be recycled -- don't skip cleanup.
*/
result = true;
}
else
{
BTOptions *relopts;
float8 cleanup_scale_factor;
float8 prev_num_heap_tuples;
/*
* If table receives enough insertions and no cleanup was performed,
* then index would appear have stale statistics. If scale factor is
* set, we avoid that by performing cleanup if the number of inserted
* tuples exceeds vacuum_cleanup_index_scale_factor fraction of
* original tuples count.
*/
relopts = (BTOptions *) info->index->rd_options;
cleanup_scale_factor = (relopts &&
relopts->vacuum_cleanup_index_scale_factor >= 0)
? relopts->vacuum_cleanup_index_scale_factor
: vacuum_cleanup_index_scale_factor;
prev_num_heap_tuples = metad->btm_last_cleanup_num_heap_tuples;
if (cleanup_scale_factor <= 0 ||
prev_num_heap_tuples <= 0 ||
(info->num_heap_tuples - prev_num_heap_tuples) /
prev_num_heap_tuples >= cleanup_scale_factor)
result = true;
return true;
}
_bt_relbuf(info->index, metabuf);
return result;
return false;
}
/*
@ -907,9 +894,6 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
* still need to do a pass over the index, to recycle any newly-recyclable
* pages or to obtain index statistics. _bt_vacuum_needs_cleanup
* determines if either are needed.
*
* Since we aren't going to actually delete any leaf items, there's no
* need to go through all the vacuum-cycle-ID pushups.
*/
if (stats == NULL)
{
@ -917,8 +901,23 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
if (!_bt_vacuum_needs_cleanup(info))
return NULL;
/*
* Since we aren't going to actually delete any leaf items, there's no
* need to go through all the vacuum-cycle-ID pushups here.
*
* Posting list tuples are a source of inaccuracy for cleanup-only
* scans. btvacuumscan() will assume that the number of index tuples
* from each page can be used as num_index_tuples, even though
* num_index_tuples is supposed to represent the number of TIDs in the
* index. This naive approach can underestimate the number of tuples
* in the index significantly.
*
* We handle the problem by making num_index_tuples an estimate in
* cleanup-only case.
*/
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
btvacuumscan(info, stats, NULL, NULL, 0);
stats->estimated_count = true;
}
/*
@ -926,12 +925,6 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
* double-counting some index tuples, so disbelieve any total that exceeds
* the underlying heap's count ... if we know that accurately. Otherwise
* this might just make matters worse.
*
* Posting list tuples are another source of inaccuracy. Cleanup-only
* btvacuumscan calls assume that the number of index tuples can be used
* as num_index_tuples, even though num_index_tuples is supposed to
* represent the number of TIDs in the index. This naive approach can
* underestimate the number of tuples in the index.
*/
if (!info->estimated_count)
{
@ -971,7 +964,6 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
* Reset counts that will be incremented during the scan; needed in case
* of multiple scans during a single VACUUM command
*/
stats->estimated_count = false;
stats->num_index_tuples = 0;
stats->pages_deleted = 0;
@ -1059,8 +1051,12 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
IndexFreeSpaceMapVacuum(rel);
/*
* Maintain the oldest btpo.xact and a count of the current number of heap
* tuples in the metapage (for the benefit of _bt_vacuum_needs_cleanup).
* Maintain the oldest btpo.xact using _bt_update_meta_cleanup_info, for
* the benefit of _bt_vacuum_needs_cleanup.
*
* Note: We deliberately don't store the count of heap tuples here
* anymore. The numHeapTuples argument to _bt_update_meta_cleanup_info()
* is left in place on Postgres 13.
*
* The page with the oldest btpo.xact is typically a page deleted by this
* VACUUM operation, since pages deleted by a previous VACUUM operation
@ -1070,8 +1066,7 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
* statistics, despite not counting as deleted pages for the purposes of
* determining the oldest btpo.xact.)
*/
_bt_update_meta_cleanup_info(rel, vstate.oldestBtpoXact,
info->num_heap_tuples);
_bt_update_meta_cleanup_info(rel, vstate.oldestBtpoXact, -1);
/* update statistics */
stats->num_pages = num_pages;
@ -1399,7 +1394,10 @@ backtrack:
* We don't count the number of live TIDs during cleanup-only calls to
* btvacuumscan (i.e. when callback is not set). We count the number
* of index tuples directly instead. This avoids the expense of
* directly examining all of the tuples on each page.
* directly examining all of the tuples on each page. VACUUM will
* treat num_index_tuples as an estimate in cleanup-only case, so it
* doesn't matter that this underestimates num_index_tuples
* significantly in some cases.
*/
if (minoff > maxoff)
attempt_pagedel = (blkno == scanblkno);

View File

@ -308,35 +308,6 @@ alter table btree_tall_tbl alter COLUMN t set storage plain;
create index btree_tall_idx on btree_tall_tbl (t, id) with (fillfactor = 10);
insert into btree_tall_tbl select g, repeat('x', 250)
from generate_series(1, 130) g;
--
-- Test vacuum_cleanup_index_scale_factor
--
-- Simple create
create table btree_test(a int);
create index btree_idx1 on btree_test(a) with (vacuum_cleanup_index_scale_factor = 40.0);
select reloptions from pg_class WHERE oid = 'btree_idx1'::regclass;
reloptions
------------------------------------------
{vacuum_cleanup_index_scale_factor=40.0}
(1 row)
-- Fail while setting improper values
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = -10.0);
ERROR: value -10.0 out of bounds for option "vacuum_cleanup_index_scale_factor"
DETAIL: Valid values are between "0.000000" and "10000000000.000000".
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = 100.0);
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = 'string');
ERROR: invalid value for floating point option "vacuum_cleanup_index_scale_factor": string
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = true);
ERROR: invalid value for floating point option "vacuum_cleanup_index_scale_factor": true
-- Simple ALTER INDEX
alter index btree_idx1 set (vacuum_cleanup_index_scale_factor = 70.0);
select reloptions from pg_class WHERE oid = 'btree_idx1'::regclass;
reloptions
------------------------------------------
{vacuum_cleanup_index_scale_factor=70.0}
(1 row)
--
-- Test for multilevel page deletion
--

View File

@ -150,25 +150,6 @@ create index btree_tall_idx on btree_tall_tbl (t, id) with (fillfactor = 10);
insert into btree_tall_tbl select g, repeat('x', 250)
from generate_series(1, 130) g;
--
-- Test vacuum_cleanup_index_scale_factor
--
-- Simple create
create table btree_test(a int);
create index btree_idx1 on btree_test(a) with (vacuum_cleanup_index_scale_factor = 40.0);
select reloptions from pg_class WHERE oid = 'btree_idx1'::regclass;
-- Fail while setting improper values
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = -10.0);
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = 100.0);
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = 'string');
create index btree_idx_err on btree_test(a) with (vacuum_cleanup_index_scale_factor = true);
-- Simple ALTER INDEX
alter index btree_idx1 set (vacuum_cleanup_index_scale_factor = 70.0);
select reloptions from pg_class WHERE oid = 'btree_idx1'::regclass;
--
-- Test for multilevel page deletion
--