Don't overlook indexes during parallel VACUUM.

Commit b4af70cb, which simplified state managed by VACUUM, performed
refactoring of parallel VACUUM in passing.  Confusion about the exact
details of the tasks that the leader process is responsible for led to
code that made it possible for parallel VACUUM to miss a subset of the
table's indexes entirely.  Specifically, indexes that fell under the
min_parallel_index_scan_size size cutoff were missed.  These indexes are
supposed to be vacuumed by the leader (alongside any parallel unsafe
indexes), but weren't vacuumed at all.  Affected indexes could easily
end up with duplicate heap TIDs, once heap TIDs were recycled for new
heap tuples.  This had generic symptoms that might be seen with almost
any index corruption involving structural inconsistencies between an
index and its table.

To fix, make sure that the parallel VACUUM leader process performs any
required index vacuuming for indexes that happen to be below the size
cutoff.  Also document the design of parallel VACUUM with these
below-size-cutoff indexes.

It's unclear how many users might be affected by this bug.  There had to
be at least three indexes on the table to hit the bug: a smaller index,
plus at least two additional indexes that themselves exceed the size
cutoff.  Cases with just one additional index would not run into
trouble, since the parallel VACUUM cost model requires two
larger-than-cutoff indexes on the table to apply any parallel
processing.  Note also that autovacuum was not affected, since it never
uses parallel processing.

Test case based on tests from a larger patch to test parallel VACUUM by
Masahiko Sawada.

Many thanks to Kamigishi Rei for her invaluable help with tracking this
problem down.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp>
Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk>
Diagnosed-By: Andres Freund <andres@anarazel.de>
Bug: #17245
Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org
Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de
Backpatch: 14-, where the refactoring commit appears.
This commit is contained in:
Peter Geoghegan 2021-11-02 12:06:17 -07:00
parent f3d4019da5
commit 9bacec15b6
5 changed files with 132 additions and 26 deletions

View File

@ -452,7 +452,7 @@ static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static int compute_parallel_vacuum_workers(LVRelState *vacrel,
int nrequested,
bool *can_parallel_vacuum);
bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
static LVParallelState *begin_parallel_vacuum(LVRelState *vacrel,
BlockNumber nblocks,
@ -2636,8 +2636,8 @@ do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
vacrel->lps->lvshared->first_time = false;
/*
* We can only provide an approximate value of num_heap_tuples in vacuum
* cases.
* We can only provide an approximate value of num_heap_tuples, at least
* for now. Matches serial VACUUM case.
*/
vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
vacrel->lps->lvshared->estimated_count = true;
@ -2825,7 +2825,7 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
if (idx >= vacrel->nindexes)
break;
/* Get the index statistics of this index from DSM */
/* Get the index statistics space from DSM, if any */
shared_istat = parallel_stats_for_idx(lvshared, idx);
/* Skip indexes not participating in parallelism */
@ -2858,8 +2858,15 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
}
/*
* Vacuum or cleanup indexes that can be processed by only the leader process
* because these indexes don't support parallel operation at that phase.
* Perform parallel processing of indexes in leader process.
*
* Handles index vacuuming (or index cleanup) for indexes that are not
* parallel safe. It's possible that this will vary for a given index, based
* on details like whether we're performing for_cleanup processing right now.
*
* Also performs processing of smaller indexes that fell under the size cutoff
* enforced by compute_parallel_vacuum_workers(). These indexes never get a
* slot for statistics in DSM.
*/
static void
do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
@ -2879,17 +2886,15 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
IndexBulkDeleteResult *istat;
shared_istat = parallel_stats_for_idx(lvshared, idx);
/* Skip already-complete indexes */
if (shared_istat != NULL)
continue;
indrel = vacrel->indrels[idx];
/*
* We're only here for the unsafe indexes
* We're only here for the indexes that parallel workers won't
* process. Note that the shared_istat test ensures that we process
* indexes that fell under initial size cutoff.
*/
if (parallel_processing_is_safe(indrel, lvshared))
if (shared_istat != NULL &&
parallel_processing_is_safe(indrel, lvshared))
continue;
/* Do vacuum or cleanup of the index */
@ -3730,12 +3735,12 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* nrequested is the number of parallel workers that user requested. If
* nrequested is 0, we compute the parallel degree based on nindexes, that is
* the number of indexes that support parallel vacuum. This function also
* sets can_parallel_vacuum to remember indexes that participate in parallel
* sets will_parallel_vacuum to remember indexes that participate in parallel
* vacuum.
*/
static int
compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
bool *can_parallel_vacuum)
bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
@ -3761,7 +3766,7 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
continue;
can_parallel_vacuum[idx] = true;
will_parallel_vacuum[idx] = true;
if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
nindexes_parallel_bulkdel++;
@ -3839,7 +3844,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
bool *can_parallel_vacuum;
bool *will_parallel_vacuum;
long maxtuples;
Size est_shared;
Size est_deadtuples;
@ -3857,15 +3862,15 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
/*
* Compute the number of parallel vacuum workers to launch
*/
can_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
parallel_workers = compute_parallel_vacuum_workers(vacrel,
nrequested,
can_parallel_vacuum);
will_parallel_vacuum);
/* Can't perform vacuum in parallel */
if (parallel_workers <= 0)
{
pfree(can_parallel_vacuum);
pfree(will_parallel_vacuum);
return lps;
}
@ -3893,7 +3898,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
/* Skip indexes that don't participate in parallel vacuum */
if (!can_parallel_vacuum[idx])
if (!will_parallel_vacuum[idx])
continue;
if (indrel->rd_indam->amusemaintenanceworkmem)
@ -3970,7 +3975,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
for (int idx = 0; idx < nindexes; idx++)
{
if (!can_parallel_vacuum[idx])
if (!will_parallel_vacuum[idx])
continue;
/* Set NOT NULL as this index does support parallelism */
@ -4013,7 +4018,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
pfree(can_parallel_vacuum);
pfree(will_parallel_vacuum);
return lps;
}
@ -4043,8 +4048,8 @@ end_parallel_vacuum(LVRelState *vacrel)
shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
/*
* Skip unused slot. The statistics of this index are already stored
* in local memory.
* Skip index -- it must have been processed by the leader, from
* inside do_serial_processing_for_unsafe_indexes()
*/
if (shared_istat == NULL)
continue;
@ -4068,6 +4073,11 @@ end_parallel_vacuum(LVRelState *vacrel)
/*
* Return shared memory statistics for index at offset 'getidx', if any
*
* Returning NULL indicates that compute_parallel_vacuum_workers() determined
* that the index is a totally unsuitable target for all parallel processing
* up front. For example, the index could be < min_parallel_index_scan_size
* cutoff.
*/
static LVSharedIndStats *
parallel_stats_for_idx(LVShared *lvshared, int getidx)

View File

@ -40,7 +40,7 @@
/*
* bulkdelete can be performed in parallel. This option can be used by
* IndexAm's that need to scan the index to delete the tuples.
* index AMs that need to scan indexes to delete tuples.
*/
#define VACUUM_OPTION_PARALLEL_BULKDEL (1 << 0)

View File

@ -0,0 +1,49 @@
SET max_parallel_maintenance_workers TO 4;
SET min_parallel_index_scan_size TO '128kB';
-- Bug #17245: Make sure that we don't totally fail to VACUUM individual indexes that
-- happen to be below min_parallel_index_scan_size during parallel VACUUM:
CREATE TABLE parallel_vacuum_table (a int) WITH (autovacuum_enabled = off);
INSERT INTO parallel_vacuum_table SELECT i from generate_series(1, 10000) i;
-- Parallel VACUUM will never be used unless there are at least two indexes
-- that exceed min_parallel_index_scan_size. Create two such indexes, and
-- a third index that is smaller than min_parallel_index_scan_size.
CREATE INDEX regular_sized_index ON parallel_vacuum_table(a);
CREATE INDEX typically_sized_index ON parallel_vacuum_table(a);
-- Note: vacuum_in_leader_small_index can apply deduplication, making it ~3x
-- smaller than the other indexes
CREATE INDEX vacuum_in_leader_small_index ON parallel_vacuum_table((1));
-- Verify (as best we can) that the cost model for parallel VACUUM
-- will make our VACUUM run in parallel, while always leaving it up to the
-- parallel leader to handle the vacuum_in_leader_small_index index:
SELECT EXISTS (
SELECT 1
FROM pg_class
WHERE oid = 'vacuum_in_leader_small_index'::regclass AND
pg_relation_size(oid) <
pg_size_bytes(current_setting('min_parallel_index_scan_size'))
) as leader_will_handle_small_index;
leader_will_handle_small_index
--------------------------------
t
(1 row)
SELECT count(*) as trigger_parallel_vacuum_nindexes
FROM pg_class
WHERE oid in ('regular_sized_index'::regclass, 'typically_sized_index'::regclass) AND
pg_relation_size(oid) >=
pg_size_bytes(current_setting('min_parallel_index_scan_size'));
trigger_parallel_vacuum_nindexes
----------------------------------
2
(1 row)
-- Parallel VACUUM with B-Tree page deletions, ambulkdelete calls:
DELETE FROM parallel_vacuum_table;
VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
-- Since vacuum_in_leader_small_index uses deduplication, we expect an
-- assertion failure with bug #17245 (in the absence of bugfix):
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios

View File

@ -96,6 +96,7 @@ test: rules psql psql_crosstab amutils stats_ext collate.linux.utf8
# run by itself so it can run parallel workers
test: select_parallel
test: write_parallel
test: vacuum_parallel
# no relation related tests can be put in this group
test: publication subscription

View File

@ -0,0 +1,46 @@
SET max_parallel_maintenance_workers TO 4;
SET min_parallel_index_scan_size TO '128kB';
-- Bug #17245: Make sure that we don't totally fail to VACUUM individual indexes that
-- happen to be below min_parallel_index_scan_size during parallel VACUUM:
CREATE TABLE parallel_vacuum_table (a int) WITH (autovacuum_enabled = off);
INSERT INTO parallel_vacuum_table SELECT i from generate_series(1, 10000) i;
-- Parallel VACUUM will never be used unless there are at least two indexes
-- that exceed min_parallel_index_scan_size. Create two such indexes, and
-- a third index that is smaller than min_parallel_index_scan_size.
CREATE INDEX regular_sized_index ON parallel_vacuum_table(a);
CREATE INDEX typically_sized_index ON parallel_vacuum_table(a);
-- Note: vacuum_in_leader_small_index can apply deduplication, making it ~3x
-- smaller than the other indexes
CREATE INDEX vacuum_in_leader_small_index ON parallel_vacuum_table((1));
-- Verify (as best we can) that the cost model for parallel VACUUM
-- will make our VACUUM run in parallel, while always leaving it up to the
-- parallel leader to handle the vacuum_in_leader_small_index index:
SELECT EXISTS (
SELECT 1
FROM pg_class
WHERE oid = 'vacuum_in_leader_small_index'::regclass AND
pg_relation_size(oid) <
pg_size_bytes(current_setting('min_parallel_index_scan_size'))
) as leader_will_handle_small_index;
SELECT count(*) as trigger_parallel_vacuum_nindexes
FROM pg_class
WHERE oid in ('regular_sized_index'::regclass, 'typically_sized_index'::regclass) AND
pg_relation_size(oid) >=
pg_size_bytes(current_setting('min_parallel_index_scan_size'));
-- Parallel VACUUM with B-Tree page deletions, ambulkdelete calls:
DELETE FROM parallel_vacuum_table;
VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
-- Since vacuum_in_leader_small_index uses deduplication, we expect an
-- assertion failure with bug #17245 (in the absence of bugfix):
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios