postgresql/src/backend/access/heap/vacuumlazy.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

3274 lines
105 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* vacuumlazy.c
* Concurrent ("lazy") vacuuming.
*
* The major space usage for vacuuming is storage for the array of dead TIDs
* that are to be removed from indexes. We want to ensure we can vacuum even
* the very largest relations with finite memory space usage. To do that, we
* set upper bounds on the number of TIDs we can keep track of at once.
*
* We are willing to use at most maintenance_work_mem (or perhaps
* autovacuum_work_mem) memory space to keep track of dead TIDs. We initially
* allocate an array of TIDs of that size, with an upper limit that depends on
* table size (this limit ensures we don't allocate a huge area uselessly for
* vacuuming small tables). If the array threatens to overflow, we must call
* lazy_vacuum to vacuum indexes (and to vacuum the pages that we've pruned).
* This frees up the memory space dedicated to storing dead TIDs.
*
* In practice VACUUM will often complete its initial pass over the target
* heap relation without ever running out of space to store TIDs. This means
* that there only needs to be one call to lazy_vacuum, after the initial pass
* completes.
*
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
* src/backend/access/heap/vacuumlazy.c
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <math.h>
#include "access/amapi.h"
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
#include "access/multixact.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
#include "access/xlog.h"
#include "catalog/index.h"
#include "catalog/storage.h"
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
#include "storage/bufmgr.h"
#include "storage/freespace.h"
#include "storage/lmgr.h"
#include "tcop/tcopprot.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
#include "utils/timestamp.h"
/*
* Space/time tradeoff parameters: do these need to be user-tunable?
*
* To consider truncating the relation, we want there to be at least
* REL_TRUNCATE_MINIMUM or (relsize / REL_TRUNCATE_FRACTION) (whichever
* is less) potentially-freeable pages.
*/
#define REL_TRUNCATE_MINIMUM 1000
#define REL_TRUNCATE_FRACTION 16
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
/*
* Timing parameters for truncate locking heuristics.
*
* These were not exposed as user tunable GUC values because it didn't seem
* that the potential for improvement was great enough to merit the cost of
* supporting them.
*/
#define VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL 20 /* ms */
#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms */
#define VACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
/*
* Threshold that controls whether we bypass index vacuuming and heap
* vacuuming as an optimization
*/
#define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
/*
* Perform a failsafe check every 4GB during the heap scan, approximately
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
*/
#define FAILSAFE_EVERY_PAGES \
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
((BlockNumber) (((uint64) 4 * 1024 * 1024 * 1024) / BLCKSZ))
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
/*
* When a table has no indexes, vacuum the FSM after every 8GB, approximately
* (it won't be exact because we only vacuum FSM after processing a heap page
* that has some removable tuples). When there are indexes, this is ignored,
* and we vacuum FSM after each index/heap cleaning pass.
*/
#define VACUUM_FSM_EVERY_PAGES \
((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
/*
* Before we consider skipping a page that's marked as clean in
* visibility map, we must've seen at least this many clean pages.
*/
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
#define SKIP_PAGES_THRESHOLD ((BlockNumber) 32)
/*
* Size of the prefetch window for lazy vacuum backwards truncation scan.
* Needs to be a power of 2.
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
{
VACUUM_ERRCB_PHASE_UNKNOWN,
VACUUM_ERRCB_PHASE_SCAN_HEAP,
VACUUM_ERRCB_PHASE_VACUUM_INDEX,
VACUUM_ERRCB_PHASE_VACUUM_HEAP,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
typedef struct LVRelState
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Target heap relation and its indexes */
Relation rel;
Relation *indrels;
int nindexes;
/* Wraparound failsafe has been triggered? */
bool failsafe_active;
/* Consider index vacuuming bypass optimization? */
bool consider_bypass_optimization;
/* Doing index vacuuming, index cleanup, rel truncation? */
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
bool do_index_vacuuming;
bool do_index_cleanup;
bool do_rel_truncate;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Buffer access strategy and parallel vacuum state */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
MultiXactId relminmxid;
double old_live_tuples; /* previous value of pg_class.reltuples */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* VACUUM operation's cutoff for pruning */
TransactionId OldestXmin;
/* VACUUM operation's cutoff for freezing XIDs and MultiXactIds */
TransactionId FreezeLimit;
MultiXactId MultiXactCutoff;
/* Error reporting state */
char *relnamespace;
char *relname;
char *indname;
BlockNumber blkno; /* used only for heap operations */
OffsetNumber offnum; /* used only for heap operations */
VacErrPhase phase;
/*
* State managed by lazy_scan_heap() follows.
*
* dead_items stores TIDs whose index tuples are deleted by index
* vacuuming. Each TID points to an LP_DEAD line pointer from a heap page
* that has been processed by lazy_scan_prune. Also needed by
* lazy_vacuum_heap_rel, which marks the same LP_DEAD line pointers as
* LP_UNUSED during second heap pass.
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
*/
VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
BlockNumber frozenskipped_pages; /* # of frozen pages we skipped */
BlockNumber tupcount_pages; /* pages whose tuples we counted */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
BlockNumber pages_removed; /* pages remove by truncation */
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Statistics output by us, for table */
double new_rel_tuples; /* new estimated total # of tuples */
double new_live_tuples; /* new estimated total # of live tuples */
/* Statistics output by index AMs */
IndexBulkDeleteResult **indstats;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Instrumentation counters */
int num_index_scans;
int64 tuples_deleted; /* # deleted from table */
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
int64 lpdead_items; /* # deleted from indexes */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
int64 new_dead_tuples; /* new estimated total # of dead items in
* table */
int64 num_tuples; /* total number of nonremovable tuples */
int64 live_tuples; /* live tuples (reltuples estimate) */
} LVRelState;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* State returned by lazy_scan_prune()
*/
typedef struct LVPagePruneState
{
bool hastup; /* Page is truncatable? */
bool has_lpdead_items; /* includes existing LP_DEAD items */
/*
* State describes the proper VM bit states to set for the page following
* pruning and freezing. all_visible implies !has_lpdead_items, but don't
* trust all_frozen result unless all_visible is also set to true.
*/
bool all_visible; /* Every item visible to all? */
bool all_frozen; /* provided all_visible is also true */
TransactionId visibility_cutoff_xid; /* For recovery conflicts */
} LVPagePruneState;
/* Struct for saving and restoring vacuum error information. */
typedef struct LVSavedErrInfo
{
BlockNumber blkno;
OffsetNumber offnum;
VacErrPhase phase;
} LVSavedErrInfo;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* elevel controls whole VACUUM's verbosity */
static int elevel = -1;
/* non-export function prototypes */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static void lazy_scan_heap(LVRelState *vacrel, VacuumParams *params,
bool aggressive);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
GlobalVisState *vistest,
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
LVPagePruneState *prunestate);
static void lazy_vacuum(LVRelState *vacrel);
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static void lazy_vacuum_heap_rel(LVRelState *vacrel);
static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
Buffer buffer, int index, Buffer *vmbuffer);
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
LVRelState *vacrel);
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_cleanup_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
bool estimated_count,
LVRelState *vacrel);
static bool should_attempt_truncation(LVRelState *vacrel);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static void update_index_statistics(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
int phase, BlockNumber blkno,
OffsetNumber offnum);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
*
* This routine sets things up for and then calls lazy_scan_heap, where
* almost all work actually takes place. Finalizes everything after call
* returns by managing rel truncation and updating pg_class statistics.
*
* At entry, we have already established a transaction and opened
* and locked the relation.
*/
void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
heap_vacuum_rel(Relation rel, VacuumParams *params,
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
BufferAccessStrategy bstrategy)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
LVRelState *vacrel;
PGRUsage ru0;
TimestampTz starttime = 0;
WalUsage walusage_start = pgWalUsage;
WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
write_rate;
bool aggressive; /* should we scan all unfrozen pages? */
bool scanned_all_unfrozen; /* actually scanned all such pages? */
char **indnames = NULL;
TransactionId xidFullScanLimit;
MultiXactId mxactFullScanLimit;
BlockNumber new_rel_pages;
BlockNumber new_rel_allvisible;
double new_live_tuples;
TransactionId new_frozen_xid;
Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
MultiXactId new_min_multi;
ErrorContextCallback errcallback;
PgStat_Counter startreadtime = 0;
PgStat_Counter startwritetime = 0;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
TransactionId OldestXmin;
TransactionId FreezeLimit;
MultiXactId MultiXactCutoff;
/* measure elapsed time iff autovacuum logging requires it */
if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
{
pg_rusage_init(&ru0);
starttime = GetCurrentTimestamp();
if (track_io_timing)
{
startreadtime = pgStatBlockReadTime;
startwritetime = pgStatBlockWriteTime;
}
}
if (params->options & VACOPT_VERBOSE)
elevel = INFO;
else
elevel = DEBUG2;
pgstat_progress_start_command(PROGRESS_COMMAND_VACUUM,
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
RelationGetRelid(rel));
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacuum_set_xid_limits(rel,
params->freeze_min_age,
params->freeze_table_age,
params->multixact_freeze_min_age,
params->multixact_freeze_table_age,
&OldestXmin, &FreezeLimit, &xidFullScanLimit,
&MultiXactCutoff, &mxactFullScanLimit);
/*
* We request an aggressive scan if the table's frozen Xid is now older
* than or equal to the requested Xid full-table scan limit; or if the
* table's minimum MultiXactId is older than or equal to the requested
* mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was specified.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
aggressive = TransactionIdPrecedesOrEquals(rel->rd_rel->relfrozenxid,
xidFullScanLimit);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
aggressive |= MultiXactIdPrecedesOrEquals(rel->rd_rel->relminmxid,
mxactFullScanLimit);
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
aggressive = true;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel = (LVRelState *) palloc0(sizeof(LVRelState));
/* Set up high level stuff about rel */
vacrel->rel = rel;
vac_open_indexes(vacrel->rel, RowExclusiveLock, &vacrel->nindexes,
&vacrel->indrels);
vacrel->failsafe_active = false;
vacrel->consider_bypass_optimization = true;
/*
* The index_cleanup param either disables index vacuuming and cleanup or
* forces it to go ahead when we would otherwise apply the index bypass
* optimization. The default is 'auto', which leaves the final decision
* up to lazy_vacuum().
*
* The truncate param allows user to avoid attempting relation truncation,
* though it can't force truncation to happen.
*/
Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
params->truncate != VACOPTVALUE_AUTO);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
vacrel->do_index_vacuuming = true;
vacrel->do_index_cleanup = true;
vacrel->do_rel_truncate = (params->truncate != VACOPTVALUE_DISABLED);
if (params->index_cleanup == VACOPTVALUE_DISABLED)
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
{
/* Force disable index vacuuming up-front */
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
vacrel->do_index_vacuuming = false;
vacrel->do_index_cleanup = false;
}
else if (params->index_cleanup == VACOPTVALUE_ENABLED)
{
/* Force index vacuuming. Note that failsafe can still bypass. */
vacrel->consider_bypass_optimization = false;
}
else
{
/* Default/auto, make all decisions dynamically */
Assert(params->index_cleanup == VACOPTVALUE_AUTO);
}
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->bstrategy = bstrategy;
vacrel->relfrozenxid = rel->rd_rel->relfrozenxid;
vacrel->relminmxid = rel->rd_rel->relminmxid;
vacrel->old_live_tuples = rel->rd_rel->reltuples;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Set cutoffs for entire VACUUM */
vacrel->OldestXmin = OldestXmin;
vacrel->FreezeLimit = FreezeLimit;
vacrel->MultiXactCutoff = MultiXactCutoff;
vacrel->relnamespace = get_namespace_name(RelationGetNamespace(rel));
vacrel->relname = pstrdup(RelationGetRelationName(rel));
vacrel->indname = NULL;
vacrel->phase = VACUUM_ERRCB_PHASE_UNKNOWN;
/* Save index names iff autovacuum logging requires it */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0 &&
vacrel->nindexes > 0)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
indnames = palloc(sizeof(char *) * vacrel->nindexes);
for (int i = 0; i < vacrel->nindexes; i++)
indnames[i] =
pstrdup(RelationGetRelationName(vacrel->indrels[i]));
}
/*
* Setup error traceback support for ereport(). The idea is to set up an
* error context callback to display additional information on any error
* during a vacuum. During different phases of vacuum (heap scan, heap
* vacuum, index vacuum, index clean up, heap truncate), we update the
* error context callback to display appropriate information.
*
* Note that the index vacuum and heap vacuum phases may be called
* multiple times in the middle of the heap scan phase. So the old phase
* information is restored at the end of those phases.
*/
errcallback.callback = vacuum_error_callback;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
errcallback.arg = vacrel;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
/*
* Call lazy_scan_heap to perform all required heap pruning, index
* vacuuming, and heap vacuuming (plus related processing)
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_scan_heap(vacrel, params, aggressive);
/* Done with indexes */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vac_close_indexes(vacrel->nindexes, vacrel->indrels, NoLock);
/*
* Compute whether we actually scanned the all unfrozen pages. If we did,
* we can adjust relfrozenxid and relminmxid.
*
* NB: We need to check this before truncating the relation, because that
* will change ->rel_pages.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if ((vacrel->scanned_pages + vacrel->frozenskipped_pages)
< vacrel->rel_pages)
{
Assert(!aggressive);
scanned_all_unfrozen = false;
}
else
scanned_all_unfrozen = true;
/*
* Optionally truncate the relation.
*/
if (should_attempt_truncation(vacrel))
{
/*
* Update error traceback information. This is the last phase during
* which we add context information to errors, so we don't need to
* revert to the previous phase.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_TRUNCATE,
vacrel->nonempty_pages,
InvalidOffsetNumber);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_truncate_heap(vacrel);
}
/* Pop the error context stack */
error_context_stack = errcallback.previous;
/* Report that we are now doing final cleanup */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_FINAL_CLEANUP);
/*
* Update statistics in pg_class.
*
Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
2020-08-30 18:21:51 +02:00
* In principle new_live_tuples could be -1 indicating that we (still)
* don't know the tuple count. In practice that probably can't happen,
* since we'd surely have scanned some pages if the table is new and
* nonempty.
*
Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
2020-08-30 18:21:51 +02:00
* For safety, clamp relallvisible to be not more than what we're setting
* relpages to.
*
* Also, don't change relfrozenxid/relminmxid if we skipped any pages,
* since then we don't know for certain that all tuples have a newer xmin.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
new_rel_pages = vacrel->rel_pages;
new_live_tuples = vacrel->new_live_tuples;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
visibilitymap_count(rel, &new_rel_allvisible, NULL);
if (new_rel_allvisible > new_rel_pages)
new_rel_allvisible = new_rel_pages;
new_frozen_xid = scanned_all_unfrozen ? FreezeLimit : InvalidTransactionId;
new_min_multi = scanned_all_unfrozen ? MultiXactCutoff : InvalidMultiXactId;
Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vac_update_relstats(rel,
new_rel_pages,
new_live_tuples,
new_rel_allvisible,
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->nindexes > 0,
Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
new_frozen_xid,
new_min_multi,
false);
/*
* Report results to the stats collector, too.
*
* Deliberately avoid telling the stats collector about LP_DEAD items that
* remain in the table due to VACUUM bypassing index and heap vacuuming.
* ANALYZE will consider the remaining LP_DEAD items to be dead "tuples".
* It seems like a good idea to err on the side of not vacuuming again too
* soon in cases where the failsafe prevented significant amounts of heap
* vacuuming.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
2020-08-30 18:21:51 +02:00
Max(new_live_tuples, 0),
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->new_dead_tuples);
pgstat_progress_end_command();
/* and log the action if appropriate */
if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
{
TimestampTz endtime = GetCurrentTimestamp();
if (params->log_min_duration == 0 ||
TimestampDifferenceExceeds(starttime, endtime,
params->log_min_duration))
{
StringInfoData buf;
char *msgfmt;
BlockNumber orig_rel_pages;
2015-05-24 03:35:49 +02:00
TimestampDifference(starttime, endtime, &secs, &usecs);
memset(&walusage, 0, sizeof(WalUsage));
WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
{
read_rate = (double) BLCKSZ * VacuumPageMiss / (1024 * 1024) /
(secs + usecs / 1000000.0);
write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
(secs + usecs / 1000000.0);
}
/*
* This is pretty messy, but we split it up so that we can skip
* emitting individual parts of the message when not applicable.
*/
initStringInfo(&buf);
if (params->is_wraparound)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/*
* While it's possible for a VACUUM to be both is_wraparound
* and !aggressive, that's just a corner-case -- is_wraparound
* implies aggressive. Produce distinct output for the corner
* case all the same, just in case.
*/
if (aggressive)
msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
else
msgfmt = _("automatic vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
}
else
{
if (aggressive)
msgfmt = _("automatic aggressive vacuum of table \"%s.%s.%s\": index scans: %d\n");
else
msgfmt = _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n");
}
appendStringInfo(&buf, msgfmt,
get_database_name(MyDatabaseId),
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relnamespace,
vacrel->relname,
vacrel->num_index_scans);
appendStringInfo(&buf, _("pages: %u removed, %u remain, %u skipped due to pins, %u skipped frozen\n"),
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->pages_removed,
vacrel->rel_pages,
vacrel->pinskipped_pages,
vacrel->frozenskipped_pages);
appendStringInfo(&buf,
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
_("tuples: %lld removed, %lld remain, %lld are dead but not yet removable, oldest xmin: %u\n"),
(long long) vacrel->tuples_deleted,
(long long) vacrel->new_rel_tuples,
(long long) vacrel->new_dead_tuples,
OldestXmin);
orig_rel_pages = vacrel->rel_pages + vacrel->pages_removed;
if (orig_rel_pages > 0)
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
{
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
2021-08-07 12:09:22 +02:00
appendStringInfoString(&buf, _("index scan not needed: "));
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
else
2021-08-07 12:09:22 +02:00
appendStringInfoString(&buf, _("index scan needed: "));
msgfmt = _("%u pages from table (%.2f%% of total) had %lld dead item identifiers removed\n");
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
}
else
{
if (!vacrel->failsafe_active)
2021-08-07 12:09:22 +02:00
appendStringInfoString(&buf, _("index scan bypassed: "));
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
else
2021-08-07 12:09:22 +02:00
appendStringInfoString(&buf, _("index scan bypassed by failsafe: "));
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
}
appendStringInfo(&buf, msgfmt,
vacrel->lpdead_item_pages,
100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
(long long) vacrel->lpdead_items);
}
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
for (int i = 0; i < vacrel->nindexes; i++)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
IndexBulkDeleteResult *istat = vacrel->indstats[i];
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (!istat)
continue;
appendStringInfo(&buf,
_("index \"%s\": pages: %u in total, %u newly deleted, %u currently deleted, %u reusable\n"),
indnames[i],
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
istat->num_pages,
istat->pages_newly_deleted,
istat->pages_deleted,
istat->pages_free);
}
if (track_io_timing)
{
double read_ms = (double) (pgStatBlockReadTime - startreadtime) / 1000;
double write_ms = (double) (pgStatBlockWriteTime - startwritetime) / 1000;
appendStringInfo(&buf, _("I/O timings: read: %.3f ms, write: %.3f ms\n"),
read_ms, write_ms);
}
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
appendStringInfo(&buf,
_("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
(long long) VacuumPageHit,
(long long) VacuumPageMiss,
(long long) VacuumPageDirty);
appendStringInfo(&buf,
_("WAL usage: %lld records, %lld full page images, %llu bytes\n"),
(long long) walusage.wal_records,
(long long) walusage.wal_fpi,
2020-09-14 06:42:07 +02:00
(unsigned long long) walusage.wal_bytes);
appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
ereport(LOG,
(errmsg_internal("%s", buf.data)));
pfree(buf.data);
}
}
/* Cleanup index statistics and index names */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
for (int i = 0; i < vacrel->nindexes; i++)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (vacrel->indstats[i])
pfree(vacrel->indstats[i]);
if (indnames && indnames[i])
pfree(indnames[i]);
}
}
/*
* lazy_scan_heap() -- workhorse function for VACUUM
*
* This routine prunes each page in the heap, and considers the need to
* freeze remaining tuples with storage (not including pages that can be
* skipped using the visibility map). Also performs related maintenance
* of the FSM and visibility map. These steps all take place during an
* initial pass over the target heap relation.
*
* Also invokes lazy_vacuum_all_indexes to vacuum indexes, which largely
* consists of deleting index tuples that point to LP_DEAD items left in
* heap pages following pruning. Earlier initial pass over the heap will
* have collected the TIDs whose index tuples need to be removed.
*
* Finally, invokes lazy_vacuum_heap_rel to vacuum heap pages, which
* largely consists of marking LP_DEAD items (from collected TID array)
* as LP_UNUSED. This has to happen in a second, final pass over the
* heap, to preserve a basic invariant that all index AMs rely on: no
* extant index tuple can ever be allowed to contain a TID that points to
* an LP_UNUSED line pointer in the heap. We must disallow premature
* recycling of line pointers to avoid index scans that get confused
* about which TID points to which tuple immediately after recycling.
* (Actually, this isn't a concern when target heap relation happens to
* have no indexes, which allows us to safely apply the one-pass strategy
* as an optimization).
*
* In practice we often have enough space to fit all TIDs, and so won't
* need to call lazy_vacuum more than once, after our initial pass over
* the heap has totally finished. Otherwise things are slightly more
* complicated: our "initial pass" over the heap applies only to those
* pages that were pruned before we needed to call lazy_vacuum, and our
* "final pass" over the heap only vacuums these same heap pages.
* However, we process indexes in full every time lazy_vacuum is called,
* which makes index processing very inefficient when memory is in short
* supply.
*/
static void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
VacDeadItems *dead_items;
BlockNumber nblocks,
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
blkno,
next_unskippable_block,
next_failsafe_block,
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
next_fsm_block_to_vacuum;
PGRUsage ru0;
Buffer vmbuffer = InvalidBuffer;
bool skipping_blocks;
StringInfoData buf;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
PROGRESS_VACUUM_MAX_DEAD_TUPLES
};
int64 initprog_val[3];
snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-13 01:03:49 +02:00
GlobalVisState *vistest;
pg_rusage_init(&ru0);
if (aggressive)
ereport(elevel,
(errmsg("aggressively vacuuming \"%s.%s\"",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relnamespace,
vacrel->relname)));
else
ereport(elevel,
(errmsg("vacuuming \"%s.%s\"",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relnamespace,
vacrel->relname)));
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
nblocks = RelationGetNumberOfBlocks(vacrel->rel);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
next_unskippable_block = 0;
next_failsafe_block = 0;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
next_fsm_block_to_vacuum = 0;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->rel_pages = nblocks;
vacrel->scanned_pages = 0;
vacrel->pinskipped_pages = 0;
vacrel->frozenskipped_pages = 0;
vacrel->tupcount_pages = 0;
vacrel->pages_removed = 0;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
vacrel->lpdead_item_pages = 0;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->nonempty_pages = 0;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Initialize instrumentation counters */
vacrel->num_index_scans = 0;
vacrel->tuples_deleted = 0;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
vacrel->lpdead_items = 0;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->new_dead_tuples = 0;
vacrel->num_tuples = 0;
vacrel->live_tuples = 0;
snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-13 01:03:49 +02:00
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vistest = GlobalVisTestFor(vacrel->rel);
vacrel->indstats = (IndexBulkDeleteResult **)
palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
/*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Do failsafe precheck before calling dead_items_alloc. This ensures
* that parallel VACUUM won't be attempted when relfrozenxid is already
* dangerously old.
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
*/
lazy_check_wraparound_failsafe(vacrel);
/*
* Allocate the space for dead_items. Note that this handles parallel
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
* VACUUM initialization as part of allocating shared memory space used
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* for dead_items.
*/
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
dead_items_alloc(vacrel, params->nworkers);
dead_items = vacrel->dead_items;
/* Report that we're scanning the heap, advertising total # of blocks */
initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
initprog_val[1] = nblocks;
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
initprog_val[2] = dead_items->max_items;
pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
/*
* Except when aggressive is set, we want to skip pages that are
* all-visible according to the visibility map, but only when we can skip
* at least SKIP_PAGES_THRESHOLD consecutive pages. Since we're reading
* sequentially, the OS should be doing readahead for us, so there's no
* gain in skipping a page now and then; that's likely to disable
* readahead and so be counterproductive. Also, skipping even a single
* page means that we can't update relfrozenxid, so we only want to do it
* if we can skip a goodly number of pages.
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
*
* When aggressive is set, we can't skip pages just because they are
* all-visible, but we can still skip pages that are all-frozen, since
* such pages do not need freezing and do not affect the value that we can
* safely set for relfrozenxid or relminmxid.
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
*
* Before entering the main loop, establish the invariant that
2017-07-07 23:10:36 +02:00
* next_unskippable_block is the next block number >= blkno that we can't
* skip based on the visibility map, either all-visible for a regular scan
* or all-frozen for an aggressive scan. We set it to nblocks if there's
* no such block. We also set up the skipping_blocks flag correctly at
* this stage.
*
* Note: The value returned by visibilitymap_get_status could be slightly
* out-of-date, since we make this test before reading the corresponding
* heap page or locking the buffer. This is OK. If we mistakenly think
* that the page is all-visible or all-frozen when in fact the flag's just
* been cleared, we might fail to vacuum the page. It's easy to see that
* skipping a page when aggressive is not set is not a very big deal; we
* might leave some dead tuples lying around, but the next vacuum will
* find them. But even when aggressive *is* set, it's still OK if we miss
* a page whose all-frozen marking has just been cleared. Any new XIDs
* just added to that page are necessarily newer than the GlobalXmin we
* computed, so they'll have no effect on the value to which we can safely
* set relfrozenxid. A similar argument applies for MXIDs and relminmxid.
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
*
* We will scan the table's last page, at least to the extent of
* determining whether it has tuples or not, even if it should be skipped
* according to the above rules; except when we've already determined that
* it's not worth trying to truncate the table. This avoids having
* lazy_truncate_heap() take access-exclusive lock on the table to attempt
* a truncation that just fails immediately because there are tuples in
* the last page. This is worth avoiding mainly because such a lock must
* be replayed on any hot standby, where it can be disruptive.
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
*/
if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
{
while (next_unskippable_block < nblocks)
{
uint8 vmstatus;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vmstatus = visibilitymap_get_status(vacrel->rel,
next_unskippable_block,
&vmbuffer);
if (aggressive)
{
if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
break;
}
else
{
if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
break;
}
vacuum_delay_point();
next_unskippable_block++;
}
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
}
if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
skipping_blocks = true;
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
else
skipping_blocks = false;
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
for (blkno = 0; blkno < nblocks; blkno++)
{
Buffer buf;
Page page;
bool all_visible_according_to_vm = false;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
LVPagePruneState prunestate;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* Consider need to skip blocks. See note above about forcing
* scanning of last page.
*/
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
#define FORCE_CHECK_PAGE() \
(blkno == nblocks - 1 && should_attempt_truncation(vacrel))
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
blkno, InvalidOffsetNumber);
if (blkno == next_unskippable_block)
{
/* Time to advance next_unskippable_block */
next_unskippable_block++;
if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
{
while (next_unskippable_block < nblocks)
{
uint8 vmskipflags;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vmskipflags = visibilitymap_get_status(vacrel->rel,
next_unskippable_block,
&vmbuffer);
if (aggressive)
{
if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
break;
}
else
{
if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
break;
}
vacuum_delay_point();
next_unskippable_block++;
}
}
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
/*
* We know we can't skip the current block. But set up
* skipping_blocks to do the right thing at the following blocks.
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
*/
if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
skipping_blocks = true;
else
skipping_blocks = false;
/*
* Normally, the fact that we can't skip this block must mean that
* it's not all-visible. But in an aggressive vacuum we know only
* that it's not all-frozen, so it might still be all-visible.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (aggressive && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
all_visible_according_to_vm = true;
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
}
else
{
/*
* The current block is potentially skippable; if we've seen a
* long enough run of skippable blocks to justify skipping it, and
* we're not forced to check it, then go ahead and skip.
* Otherwise, the page must be at least all-visible if not
* all-frozen, so we can set all_visible_according_to_vm = true.
*/
if (skipping_blocks && !FORCE_CHECK_PAGE())
{
/*
* Tricky, tricky. If this is in aggressive vacuum, the page
* must have been all-frozen at the time we checked whether it
* was skippable, but it might not be any more. We must be
* careful to count it as a skipped all-frozen page in that
* case, or else we'll think we can't update relfrozenxid and
* relminmxid. If it's not an aggressive vacuum, we don't
* know whether it was all-frozen, so we have to recheck; but
* in this case an approximate answer is OK.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (aggressive || VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
vacrel->frozenskipped_pages++;
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
continue;
}
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
all_visible_according_to_vm = true;
}
vacuum_delay_point();
2004-02-06 20:36:18 +01:00
/*
* Regularly check if wraparound failsafe should trigger.
*
* There is a similar check inside lazy_vacuum_all_indexes(), but
* relfrozenxid might start to look dangerously old before we reach
* that point. This check also provides failsafe coverage for the
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
if (blkno - next_failsafe_block >= FAILSAFE_EVERY_PAGES)
{
lazy_check_wraparound_failsafe(vacrel);
next_failsafe_block = blkno;
}
/*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Consider if we definitely have enough space to process TIDs on page
* already. If we are close to overrunning the available space for
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* dead_items TIDs, pause and do a cycle of vacuuming before we tackle
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* this page.
*/
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
Assert(dead_items->max_items >= MaxHeapTuplesPerPage);
if (dead_items->max_items - dead_items->num_items < MaxHeapTuplesPerPage)
{
/*
* Before beginning index vacuuming, we release any pin we may
* hold on the visibility map page. This isn't necessary for
* correctness, but we do it anyway to avoid holding the pin
* across a lengthy, unrelated operation.
*/
if (BufferIsValid(vmbuffer))
{
ReleaseBuffer(vmbuffer);
vmbuffer = InvalidBuffer;
}
/* Perform a round of index and heap vacuuming */
vacrel->consider_bypass_optimization = false;
lazy_vacuum(vacrel);
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
/*
* Vacuum the Free Space Map to make newly-freed space visible on
* upper-level FSM pages. Note we have not yet processed blkno.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
blkno);
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
next_fsm_block_to_vacuum = blkno;
/* Report that we are once again scanning the heap */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_SCAN_HEAP);
}
/*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Set up visibility map page as needed.
*
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
* already have the correct page pinned anyway. However, it's
* possible that (a) next_unskippable_block is covered by a different
* VM page than the current block or (b) we released our pin and did a
* cycle of index vacuuming.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
visibilitymap_pin(vacrel->rel, blkno, &vmbuffer);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vacrel->bstrategy);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* We need buffer cleanup lock so that we can prune HOT chains and
* defragment the page.
*/
if (!ConditionalLockBufferForCleanup(buf))
{
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
bool hastup;
/*
* If we're not performing an aggressive scan to guard against XID
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
* wraparound, and we don't want to forcibly check the page, then
* it's OK to skip vacuuming pages we get a lock conflict on. They
* will be dealt with in some future vacuum.
*/
if (!aggressive && !FORCE_CHECK_PAGE())
{
ReleaseBuffer(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->pinskipped_pages++;
continue;
}
/*
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
* Read the page with share lock to see if any xids on it need to
* be frozen. If not we just skip the page, after updating our
* scan statistics. If there are some, we wait for cleanup lock.
*
* We could defer the lock request further by remembering the page
2012-04-13 14:54:13 +02:00
* and coming back to it later, or we could even register
* ourselves for multiple buffers and then service whichever one
* is received first. For now, this seems good enough.
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
*
* If we get here with aggressive false, then we're just forcibly
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
* checking the page, and so we don't want to insist on getting
* the lock; we only need to know if the page contains tuples, so
* that we can update nonempty_pages correctly. It's convenient
* to use lazy_check_needs_freeze() for both situations, though.
*/
LockBuffer(buf, BUFFER_LOCK_SHARE);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (!lazy_check_needs_freeze(buf, &hastup, vacrel))
{
UnlockReleaseBuffer(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->scanned_pages++;
vacrel->pinskipped_pages++;
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
if (hastup)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->nonempty_pages = blkno + 1;
continue;
}
if (!aggressive)
{
/*
* Here, we must not advance scanned_pages; that would amount
* to claiming that the page contains no freezable tuples.
*/
UnlockReleaseBuffer(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->pinskipped_pages++;
if (hastup)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->nonempty_pages = blkno + 1;
continue;
}
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockBufferForCleanup(buf);
/* drop through to normal processing */
}
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* By here we definitely have enough dead_items space for whatever
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* LP_DEAD tids are on this page, we have the visibility map page set
* up in case we need to set this page's all_visible/all_frozen bit,
* and we have a cleanup lock. Any tuples on this page are now sure
* to be "counted" by this VACUUM.
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*
* One last piece of preamble needs to take place before we can prune:
* we need to consider new and empty pages.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->scanned_pages++;
vacrel->tupcount_pages++;
page = BufferGetPage(buf);
if (PageIsNew(page))
{
/*
Move page initialization from RelationAddExtraBlocks() to use, take 2. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 10:27:19 +01:00
* All-zeroes pages can be left over if either a backend extends
* the relation by a single page, but crashes before the newly
* initialized page has been written out, or when bulk-extending
* the relation (which creates a number of empty pages at the tail
* end of the relation, but enters them into the FSM).
*
Move page initialization from RelationAddExtraBlocks() to use, take 2. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 10:27:19 +01:00
* Note we do not enter the page into the visibilitymap. That has
* the downside that we repeatedly visit this page in subsequent
* vacuums, but otherwise we'll never not discover the space on a
* promoted standby. The harm of repeated checking ought to
* normally not be too bad - the space usually should be used at
* some point, otherwise there wouldn't be any regular vacuums.
*
* Make sure these pages are in the FSM, to ensure they can be
* reused. Do that by testing if there's any space recorded for
* the page. If not, enter it. We do so after releasing the lock
* on the heap page, the FSM is approximate, after all.
Move page initialization from RelationAddExtraBlocks() to use. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-01-28 22:15:11 +01:00
*/
Move page initialization from RelationAddExtraBlocks() to use, take 2. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 10:27:19 +01:00
UnlockReleaseBuffer(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (GetRecordedFreeSpace(vacrel->rel, blkno) == 0)
{
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
Size freespace = BLCKSZ - SizeOfPageHeaderData;
Move page initialization from RelationAddExtraBlocks() to use, take 2. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 10:27:19 +01:00
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
Move page initialization from RelationAddExtraBlocks() to use, take 2. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 10:27:19 +01:00
}
continue;
}
if (PageIsEmpty(page))
{
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
Size freespace = PageGetHeapFreeSpace(page);
Move page initialization from RelationAddExtraBlocks() to use, take 2. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 10:27:19 +01:00
/*
* Empty pages are always all-visible and all-frozen (note that
* the same is currently not true for new pages, see above).
*/
if (!PageIsAllVisible(page))
{
START_CRIT_SECTION();
/* mark buffer dirty before writing a WAL record */
MarkBufferDirty(buf);
/*
* It's possible that another backend has extended the heap,
* initialized the page, and then failed to WAL-log the page
* due to an ERROR. Since heap extension is not WAL-logged,
* recovery might try to replay our record setting the page
* all-visible and find that the page isn't initialized, which
* will cause a PANIC. To prevent that, check whether the
* page has been previously WAL-logged, and if not, do that
* now.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (RelationNeedsWAL(vacrel->rel) &&
PageGetLSN(page) == InvalidXLogRecPtr)
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
}
UnlockReleaseBuffer(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
continue;
}
/*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Prune and freeze tuples.
*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Accumulates details of remaining LP_DEAD line pointers on page in
* dead_items array. This includes LP_DEAD line pointers that we
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* pruned ourselves, as well as existing LP_DEAD line pointers that
* were pruned some time earlier. Also considers freezing XIDs in the
* tuple headers of remaining items with storage.
*/
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
lazy_scan_prune(vacrel, buf, blkno, page, vistest, &prunestate);
2010-07-06 21:19:02 +02:00
Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/* Remember the location of the last page with nonremovable tuples */
if (prunestate.hastup)
vacrel->nonempty_pages = blkno + 1;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
if (vacrel->nindexes == 0)
{
/*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Consider the need to do page-at-a-time heap vacuuming when
* using the one-pass strategy now.
*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* The one-pass strategy will never call lazy_vacuum(). The steps
* performed here can be thought of as the one-pass equivalent of
* a call to lazy_vacuum().
*/
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
if (prunestate.has_lpdead_items)
{
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
Size freespace;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
lazy_vacuum_heap_page(vacrel, blkno, buf, 0, &vmbuffer);
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
/* Forget the LP_DEAD items that we just vacuumed */
dead_items->num_items = 0;
/*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Periodically perform FSM vacuuming to make newly-freed
* space visible on upper FSM pages. Note we have not yet
* performed FSM processing for blkno.
*/
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
blkno);
next_fsm_block_to_vacuum = blkno;
}
/*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Now perform FSM processing for blkno, and move on to next
* page.
*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Our call to lazy_vacuum_heap_page() will have considered if
* it's possible to set all_visible/all_frozen independently
* of lazy_scan_prune(). Note that prunestate was invalidated
* by lazy_vacuum_heap_page() call.
*/
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
freespace = PageGetHeapFreeSpace(page);
2010-07-06 21:19:02 +02:00
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
UnlockReleaseBuffer(buf);
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
continue;
}
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
/*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* There was no call to lazy_vacuum_heap_page() because pruning
* didn't encounter/create any LP_DEAD items that needed to be
* vacuumed. Prune state has not been invalidated, so proceed
* with prunestate-driven visibility map and FSM steps (just like
* the two-pass strategy).
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
*/
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
Assert(dead_items->num_items == 0);
}
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* Handle setting visibility map bit based on what the VM said about
* the page before pruning started, and using prunestate
*/
if (!all_visible_according_to_vm && prunestate.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
if (prunestate.all_frozen)
flags |= VISIBILITYMAP_ALL_FROZEN;
/*
* It should never be the case that the visibility map page is set
* while the page-level bit is clear, but the reverse is allowed
* (if checksums are not enabled). Regardless, set both bits so
* that we get back in sync.
*
* NB: If the heap page is all-visible but the VM bit is not set,
* we don't need to dirty the heap page. However, if checksums
* are enabled, we do need to make sure that the heap page is
* dirtied before passing it to visibilitymap_set(), because it
* may be logged. Given that this situation should only happen in
* rare cases after a crash, it is not worth optimizing.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
vmbuffer, prunestate.visibility_cutoff_xid,
flags);
}
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if
* the page-level bit is clear. However, it's possible that the bit
* got cleared after we checked it and before we took the buffer
* content lock, so we must recheck before jumping to the conclusion
* that something bad has happened.
*/
else if (all_visible_according_to_vm && !PageIsAllVisible(page)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
&& VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
{
elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relname, blkno);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
}
2011-04-10 17:42:00 +02:00
/*
snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-13 01:03:49 +02:00
* It's possible for the value returned by
* GetOldestNonRemovableTransactionId() to move backwards, so it's not
* wrong for us to see tuples that appear to not be visible to
* everyone yet, while PD_ALL_VISIBLE is already set. The real safe
* xmin value never moves backwards, but
* GetOldestNonRemovableTransactionId() is conservative and sometimes
* returns a value that's unnecessarily small, so if we see that
* contradiction it just means that the tuples that we think are not
* visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
* is correct.
*
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
* set, however.
*/
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relname, blkno);
PageClearAllVisible(page);
MarkBufferDirty(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
}
/*
* If the all-visible page is all-frozen but not marked as such yet,
* mark it as all-frozen. Note that all_frozen is only valid if
* all_visible is true, so we must check both.
*/
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
else if (all_visible_according_to_vm && prunestate.all_visible &&
prunestate.all_frozen &&
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
!VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* We can pass InvalidTransactionId as the cutoff XID here,
* because setting the all-frozen bit doesn't cause recovery
* conflicts.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_FROZEN);
}
/*
* Final steps for block: drop cleanup lock, record free space in the
* FSM
*/
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
{
/*
* Wait until lazy_vacuum_heap_rel() to save free space. This
* doesn't just save us some cycles; it also allows us to record
* any additional free space that lazy_vacuum_heap_page() will
* make available in cases where it's possible to truncate the
* page's line pointer array.
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
* Note: It's not in fact 100% certain that we really will call
* lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
* index vacuuming (and so must skip heap vacuuming). This is
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
* deemed okay because it only happens in emergencies, or when
* there is very little free space anyway. (Besides, we start
* recording free space in the FSM once index vacuuming has been
* abandoned.)
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
*
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
* Note: The one-pass (no indexes) case is only supposed to make
* it this far when there were no LP_DEAD items during pruning.
*/
Assert(vacrel->nindexes > 0);
UnlockReleaseBuffer(buf);
}
else
{
Size freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
}
}
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/* report that everything is now scanned */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
/* Clear the block number information */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->blkno = InvalidBlockNumber;
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
/* now we can compute the new value for pg_class.reltuples */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, nblocks,
vacrel->tupcount_pages,
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
vacrel->live_tuples);
Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
2020-08-30 18:21:51 +02:00
/*
* Also compute the total number of surviving heap entries. In the
* (unlikely) scenario that new_live_tuples is -1, take it as zero.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->new_rel_tuples =
Max(vacrel->new_live_tuples, 0) + vacrel->new_dead_tuples;
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
/*
* Release any remaining pin on visibility map page.
*/
if (BufferIsValid(vmbuffer))
{
ReleaseBuffer(vmbuffer);
vmbuffer = InvalidBuffer;
}
/* Perform a final round of index and heap vacuuming */
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
if (dead_items->num_items > 0)
lazy_vacuum(vacrel);
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
/*
* Vacuum the remainder of the Free Space Map. We must do this whether or
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
* not there were indexes, and whether or not we bypassed index vacuuming.
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
*/
if (blkno > next_fsm_block_to_vacuum)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, blkno);
While vacuuming a large table, update upper-level FSM data every so often. VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 17:29:54 +02:00
/* report all blocks vacuumed */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
/* Do post-vacuum cleanup */
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_cleanup_all_indexes(vacrel);
/*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Free resources managed by dead_items_alloc. This will end parallel
* mode when needed (it must end before updating index statistics as we
* can't write in parallel mode).
*/
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
dead_items_cleanup(vacrel);
/* Update index statistics */
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
update_index_statistics(vacrel);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* When the table has no indexes (i.e. in the one-pass strategy case),
* make log report that lazy_vacuum_heap_rel would've made had there been
* indexes. (As in the two-pass strategy case, only make this report when
* there were LP_DEAD line pointers vacuumed in lazy_vacuum_heap_page.)
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*/
if (vacrel->nindexes == 0 && vacrel->lpdead_item_pages > 0)
ereport(elevel,
2021-08-07 12:09:22 +02:00
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
vacrel->relname, (long long) vacrel->lpdead_items,
vacrel->lpdead_item_pages)));
/*
* Make a log report summarizing pruning and freezing.
*
* The autovacuum specific logging in heap_vacuum_rel summarizes an entire
* VACUUM operation, whereas each VACUUM VERBOSE log report generally
* summarizes a single round of index/heap vacuuming (or rel truncation).
* It wouldn't make sense to report on pruning or freezing while following
* that convention, though. You can think of this log report as a summary
* of our first pass over the heap.
*/
initStringInfo(&buf);
appendStringInfo(&buf,
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
_("%lld dead row versions cannot be removed yet, oldest xmin: %u\n"),
(long long) vacrel->new_dead_tuples, vacrel->OldestXmin);
appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, ",
"Skipped %u pages due to buffer pins, ",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->pinskipped_pages),
vacrel->pinskipped_pages);
appendStringInfo(&buf, ngettext("%u frozen page.\n",
"%u frozen pages.\n",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->frozenskipped_pages),
vacrel->frozenskipped_pages);
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
(errmsg("table \"%s.%s\": found %lld removable, %lld nonremovable row versions in %u out of %u pages",
vacrel->relnamespace,
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relname,
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
(long long) vacrel->tuples_deleted,
(long long) vacrel->num_tuples, vacrel->scanned_pages,
nblocks),
errdetail_internal("%s", buf.data)));
pfree(buf.data);
}
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
* Caller must hold pin and buffer cleanup lock on the buffer.
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
*
* Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
* was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
* whether or not a tuple should be considered DEAD. This happened when an
* inserting transaction concurrently aborted (after our heap_page_prune()
* call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
* of complexity just so we could deal with tuples that were DEAD to VACUUM,
* but nevertheless were left with storage after pruning.
*
* The approach we take now is to restart pruning when the race condition is
* detected. This allows heap_page_prune() to prune the tuples inserted by
* the now-aborted transaction. This is a little crude, but it guarantees
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* that any items that make it into the dead_items array are simple LP_DEAD
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
* line pointers, and that every remaining item with tuple storage is
* considered as a candidate for freezing.
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*/
static void
lazy_scan_prune(LVRelState *vacrel,
Buffer buf,
BlockNumber blkno,
Page page,
GlobalVisState *vistest,
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
LVPagePruneState *prunestate)
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
{
Relation rel = vacrel->rel;
OffsetNumber offnum,
maxoff;
ItemId itemid;
HeapTupleData tuple;
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
HTSV_Result res;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
int tuples_deleted,
lpdead_items,
new_dead_tuples,
num_tuples,
live_tuples;
int nnewlpdead;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
int nfrozen;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage];
maxoff = PageGetMaxOffsetNumber(page);
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
retry:
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/* Initialize (or reset) page-level counters */
tuples_deleted = 0;
lpdead_items = 0;
new_dead_tuples = 0;
num_tuples = 0;
live_tuples = 0;
/*
* Prune all HOT-update chains in this page.
*
* We count tuples removed by the pruning step as tuples_deleted. Its
* final value can be thought of as the number of tuples that have been
* deleted from the table. It should not be confused with lpdead_items;
* lpdead_items's final value can be thought of as the number of tuples
* that were deleted from indexes.
*/
tuples_deleted = heap_page_prune(rel, buf, vistest,
InvalidTransactionId, 0, &nnewlpdead,
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
&vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage
*/
prunestate->hastup = false;
prunestate->has_lpdead_items = false;
prunestate->all_visible = true;
prunestate->all_frozen = true;
prunestate->visibility_cutoff_xid = InvalidTransactionId;
nfrozen = 0;
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
bool tuple_totally_frozen;
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
continue;
/* Redirect items mustn't be touched */
if (ItemIdIsRedirected(itemid))
{
prunestate->hastup = true; /* page won't be truncatable */
continue;
}
/*
* LP_DEAD items are processed outside of the loop.
*
* Note that we deliberately don't set hastup=true in the case of an
* LP_DEAD item here, which is not how lazy_check_needs_freeze() or
* count_nondeletable_pages() do it -- they only consider pages empty
* when they only have LP_UNUSED items, which is important for
* correctness.
*
* Our assumption is that any LP_DEAD items we encounter here will
* become LP_UNUSED inside lazy_vacuum_heap_page() before we actually
* call count_nondeletable_pages(). In any case our opinion of
* whether or not a page 'hastup' (which is how our caller sets its
* vacrel->nonempty_pages value) is inherently race-prone. It must be
* treated as advisory/unreliable, so we might as well be slightly
* optimistic.
*/
if (ItemIdIsDead(itemid))
{
deadoffsets[lpdead_items++] = offnum;
prunestate->all_visible = false;
prunestate->has_lpdead_items = true;
continue;
}
Assert(ItemIdIsNormal(itemid));
ItemPointerSet(&(tuple.t_self), blkno, offnum);
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
/*
* DEAD tuples are almost always pruned into LP_DEAD line pointers by
* heap_page_prune(), but it's possible that the tuple state changed
* since heap_page_prune() looked. Handle that here by restarting.
* (See comments at the top of function for a full explanation.)
*/
res = HeapTupleSatisfiesVacuum(&tuple, vacrel->OldestXmin, buf);
if (unlikely(res == HEAPTUPLE_DEAD))
goto retry;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
* and ANALYZE may produce wildly different reltuples values, e.g.
* when there are many recently-dead tuples.
*
* The logic here is a bit simpler than acquire_sample_rows(), as
* VACUUM can't run inside a transaction block, which makes some cases
* impossible (e.g. in-progress insert from the same transaction).
*
* We treat LP_DEAD items (which are the closest thing to DEAD tuples
* that might be seen here) differently, too: we assume that they'll
* become LP_UNUSED before VACUUM finishes. This difference is only
* superficial. VACUUM effectively agrees with ANALYZE about DEAD
* items, in the end. VACUUM won't remember LP_DEAD items, but only
* because they're not supposed to be left behind when it is done.
* (Cases where we bypass index vacuuming will violate this optimistic
* assumption, but the overall impact of that should be negligible.)
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*/
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
switch (res)
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
{
case HEAPTUPLE_LIVE:
/*
* Count it as live. Not only is this natural, but it's also
* what acquire_sample_rows() does.
*/
live_tuples++;
/*
* Is the tuple definitely visible to all transactions?
*
* NB: Like with per-tuple hint bits, we can't set the
* PD_ALL_VISIBLE flag if the inserter committed
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
if (prunestate->all_visible)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
prunestate->all_visible = false;
break;
}
/*
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin, vacrel->OldestXmin))
{
prunestate->all_visible = false;
break;
}
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid))
prunestate->visibility_cutoff_xid = xmin;
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
/*
* If tuple is recently deleted then we must not remove it
* from relation. (We only remove items that are LP_DEAD from
* pruning.)
*/
new_dead_tuples++;
prunestate->all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
/*
* We do not count these rows as live, because we expect the
* inserting transaction to update the counters at commit, and
* we assume that will happen only after we report our
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
prunestate->all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
/* This is an expected case during concurrent vacuum */
prunestate->all_visible = false;
/*
* Count such rows as live. As above, we assume the deleting
* transaction will commit and update the counters after we
* report.
*/
live_tuples++;
break;
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
/*
* Non-removable tuple (i.e. tuple with storage).
*
* Check tuple left behind after pruning to see if needs to be frozen
* now.
*/
num_tuples++;
prunestate->hastup = true;
if (heap_prepare_freeze_tuple(tuple.t_data,
vacrel->relfrozenxid,
vacrel->relminmxid,
vacrel->FreezeLimit,
vacrel->MultiXactCutoff,
&frozen[nfrozen],
&tuple_totally_frozen))
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
{
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
/* Will execute freeze below */
frozen[nfrozen++].offset = offnum;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
}
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
/*
* If tuple is not frozen (and not about to become frozen) then caller
* had better not go on to set this page's VM bit
*/
if (!tuple_totally_frozen)
prunestate->all_frozen = false;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
}
/*
* We have now divided every item on the page into either an LP_DEAD item
* that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
* that remains and needs to be considered for freezing now (LP_UNUSED and
* LP_REDIRECT items also remain, but are of no further interest to us).
*/
vacrel->offnum = InvalidOffsetNumber;
/*
* Consider the need to freeze any items with tuple storage from the page
* first (arbitrary)
*/
if (nfrozen > 0)
{
Assert(prunestate->hastup);
/*
* At least one tuple with storage needs to be frozen -- execute that
* now.
*
* If we need to freeze any tuples we'll mark the buffer dirty, and
* write a WAL record recording the changes. We must log the changes
* to be crash-safe against future truncation of CLOG.
*/
START_CRIT_SECTION();
MarkBufferDirty(buf);
/* execute collected freezes */
for (int i = 0; i < nfrozen; i++)
{
HeapTupleHeader htup;
itemid = PageGetItemId(page, frozen[i].offset);
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, &frozen[i]);
}
/* Now WAL-log freezing if necessary */
if (RelationNeedsWAL(vacrel->rel))
{
XLogRecPtr recptr;
recptr = log_heap_freeze(vacrel->rel, buf, vacrel->FreezeLimit,
frozen, nfrozen);
PageSetLSN(page, recptr);
}
END_CRIT_SECTION();
}
/*
* The second pass over the heap can also set visibility map bits, using
* the same approach. This is important when the table frequently has a
* few old LP_DEAD items on each page by the time we get to it (typically
* because past opportunistic pruning operations freed some non-HOT
* tuples).
*
* VACUUM will call heap_page_is_all_visible() during the second pass over
* the heap to determine all_visible and all_frozen for the page -- this
* is a specialized version of the logic from this function. Now that
* we've finished pruning and freezing, make sure that we're in total
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
if (prunestate->all_visible)
{
TransactionId cutoff;
bool all_frozen;
if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
Assert(false);
Assert(lpdead_items == 0);
Assert(prunestate->all_frozen == all_frozen);
/*
* It's possible that we froze tuples and made the page's XID cutoff
* (for recovery conflict purposes) FrozenTransactionId. This is okay
* because visibility_cutoff_xid will be logged by our caller in a
* moment.
*/
Assert(cutoff == FrozenTransactionId ||
cutoff == prunestate->visibility_cutoff_xid);
}
#endif
/*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Now save details of the LP_DEAD items from the page in vacrel
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*/
if (lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
ItemPointerData tmp;
Assert(!prunestate->all_visible);
Assert(prunestate->has_lpdead_items);
vacrel->lpdead_item_pages++;
ItemPointerSetBlockNumber(&tmp, blkno);
for (int i = 0; i < lpdead_items; i++)
{
ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
dead_items->items[dead_items->num_items++] = tmp;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
}
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
Assert(dead_items->num_items <= dead_items->max_items);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
dead_items->num_items);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
}
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += tuples_deleted;
vacrel->lpdead_items += lpdead_items;
vacrel->new_dead_tuples += new_dead_tuples;
vacrel->num_tuples += num_tuples;
vacrel->live_tuples += live_tuples;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
}
/*
* Main entry point for index vacuuming and heap vacuuming.
*
* Removes items collected in dead_items from table's indexes, then marks the
* same items LP_UNUSED in the heap. See the comments above lazy_scan_heap
* for full details.
*
* Also empties dead_items, freeing up space for later TIDs.
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
*
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
* We may choose to bypass index vacuuming at this point, though only when the
* ongoing VACUUM operation will definitely only have one index scan/round of
* index vacuuming.
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*/
static void
lazy_vacuum(LVRelState *vacrel)
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
{
bool bypass;
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
{
Assert(!vacrel->do_index_cleanup);
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
vacrel->dead_items->num_items = 0;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
return;
}
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
/*
* Consider bypassing index vacuuming (and heap vacuuming) entirely.
*
* We currently only do this in cases where the number of LP_DEAD items
* for the entire VACUUM operation is close to zero. This avoids sharp
* discontinuities in the duration and overhead of successive VACUUM
* operations that run against the same table with a fixed workload.
* Ideally, successive VACUUM operations will behave as if there are
* exactly zero LP_DEAD items in cases where there are close to zero.
*
* This is likely to be helpful with a table that is continually affected
* by UPDATEs that can mostly apply the HOT optimization, but occasionally
* have small aberrations that lead to just a few heap pages retaining
* only one or two LP_DEAD items. This is pretty common; even when the
* DBA goes out of their way to make UPDATEs use HOT, it is practically
* impossible to predict whether HOT will be applied in 100% of cases.
* It's far easier to ensure that 99%+ of all UPDATEs against a table use
* HOT through careful tuning.
*/
bypass = false;
if (vacrel->consider_bypass_optimization && vacrel->rel_pages > 0)
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
{
BlockNumber threshold;
Assert(vacrel->num_index_scans == 0);
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
Assert(vacrel->lpdead_items == vacrel->dead_items->num_items);
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
/*
* This crossover point at which we'll start to do index vacuuming is
* expressed as a percentage of the total number of heap pages in the
* table that are known to have at least one LP_DEAD item. This is
* much more important than the total number of LP_DEAD items, since
* it's a proxy for the number of heap pages whose visibility map bits
* cannot be set on account of bypassing index and heap vacuuming.
*
* We apply one further precautionary test: the space currently used
* to store the TIDs (TIDs that now all point to LP_DEAD items) must
* not exceed 32MB. This limits the risk that we will bypass index
* vacuuming again and again until eventually there is a VACUUM whose
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* dead_items space is not CPU cache resident.
*
* We don't take any special steps to remember the LP_DEAD items (such
* as counting them in new_dead_tuples report to the stats collector)
* when the optimization is applied. Though the accounting used in
* analyze.c's acquire_sample_rows() will recognize the same LP_DEAD
* items as dead rows in its own stats collector report, that's okay.
* The discrepancy should be negligible. If this optimization is ever
* expanded to cover more cases then this may need to be reconsidered.
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
bypass = (vacrel->lpdead_item_pages < threshold &&
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
vacrel->lpdead_items < MAXDEADITEMS(32L * 1024L * 1024L));
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
}
if (bypass)
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
{
/*
* There are almost zero TIDs. Behave as if there were precisely
* zero: bypass index vacuuming, but do index cleanup.
*
* We expect that the ongoing VACUUM operation will finish very
* quickly, so there is no point in considering speeding up as a
* failsafe against wraparound failure. (Index cleanup is expected to
* finish very quickly in cases where there were no ambulkdelete()
* calls.)
*/
vacrel->do_index_vacuuming = false;
ereport(elevel,
2021-08-07 12:09:22 +02:00
(errmsg("table \"%s\": index scan bypassed: %u pages from table (%.2f%% of total) have %lld dead item identifiers",
vacrel->relname, vacrel->lpdead_item_pages,
Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
2021-04-08 01:14:54 +02:00
100.0 * vacrel->lpdead_item_pages / vacrel->rel_pages,
(long long) vacrel->lpdead_items)));
}
else if (lazy_vacuum_all_indexes(vacrel))
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
{
/*
* We successfully completed a round of index vacuuming. Do related
* heap vacuuming now.
*/
lazy_vacuum_heap_rel(vacrel);
}
else
{
/*
* Failsafe case.
*
* We attempted index vacuuming, but didn't finish a full round/full
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
* index scan. This happens when relfrozenxid or relminmxid is too
* far in the past.
*
* From this point on the VACUUM operation will do no further index
* vacuuming or heap vacuuming. This VACUUM operation won't end up
* back here again.
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
*/
Assert(vacrel->failsafe_active);
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
}
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
/*
* Forget the LP_DEAD items that we just vacuumed (or just decided to not
* vacuum)
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
*/
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
vacrel->dead_items->num_items = 0;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
}
/*
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
* lazy_vacuum_all_indexes() -- Main entry for index vacuuming
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
*
* Returns true in the common case when all indexes were successfully
* vacuumed. Returns false in rare cases where we determined that the ongoing
* VACUUM operation is at risk of taking too long to finish, leading to
* wraparound failure.
*/
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
static bool
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_vacuum_all_indexes(LVRelState *vacrel)
{
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
bool allindexes = true;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
Assert(vacrel->nindexes > 0);
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
Assert(TransactionIdIsNormal(vacrel->relfrozenxid));
Assert(MultiXactIdIsValid(vacrel->relminmxid));
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
/* Precheck for XID wraparound emergencies */
if (lazy_check_wraparound_failsafe(vacrel))
{
/* Wraparound emergency -- don't even start an index scan */
return false;
}
/* Report that we are now vacuuming indexes */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (!ParallelVacuumIsActive(vacrel))
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
Relation indrel = vacrel->indrels[idx];
IndexBulkDeleteResult *istat = vacrel->indstats[idx];
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->indstats[idx] =
lazy_vacuum_one_index(indrel, istat, vacrel->old_live_tuples,
vacrel);
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
if (lazy_check_wraparound_failsafe(vacrel))
{
/* Wraparound emergency -- end current index scan */
allindexes = false;
break;
}
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
}
}
else
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Outsource everything to parallel variant */
parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples,
vacrel->num_index_scans);
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
* that parallel VACUUM only gets the precheck and this postcheck.
*/
if (lazy_check_wraparound_failsafe(vacrel))
allindexes = false;
}
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
/*
* We delete all LP_DEAD items from the first heap pass in all indexes on
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
* each call here (except calls where we choose to do the failsafe). This
* makes the next call to lazy_vacuum_heap_rel() safe (except in the event
* of the failsafe triggering, which prevents the next call from taking
* place).
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
*/
Assert(vacrel->num_index_scans > 0 ||
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
vacrel->dead_items->num_items == vacrel->lpdead_items);
Assert(allindexes || vacrel->failsafe_active);
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
/*
* Increase and report the number of index scans.
*
* We deliberately include the case where we started a round of bulk
* deletes that we weren't able to finish due to the failsafe triggering.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->num_index_scans++;
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_INDEX_VACUUMS,
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->num_index_scans);
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
return allindexes;
}
/*
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
* lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* This routine marks LP_DEAD items in vacrel->dead_items array as LP_UNUSED.
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
* Pages that never had lazy_scan_prune record LP_DEAD items are not visited
* at all.
*
* We may also be able to truncate the line pointer array of the heap pages we
* visit. If there is a contiguous group of LP_UNUSED items at the end of the
* array, it can be reclaimed as free space. These LP_UNUSED items usually
* start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
* each page to LP_UNUSED, and then consider if it's possible to truncate the
* page's line pointer array).
*
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
* Note: the reason for doing this as a second pass is we cannot remove the
* tuples until we've removed their index entries, and we want to process
* index entry removal in batches as large as possible.
*/
static void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_vacuum_heap_rel(LVRelState *vacrel)
{
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
int index;
BlockNumber vacuumed_pages;
PGRUsage ru0;
Buffer vmbuffer = InvalidBuffer;
LVSavedErrInfo saved_err_info;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
Assert(vacrel->num_index_scans > 0);
/* Report that we are now vacuuming the heap */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
/* Update error traceback information */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
update_vacuum_error_info(vacrel, &saved_err_info,
VACUUM_ERRCB_PHASE_VACUUM_HEAP,
InvalidBlockNumber, InvalidOffsetNumber);
pg_rusage_init(&ru0);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacuumed_pages = 0;
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
index = 0;
while (index < vacrel->dead_items->num_items)
{
BlockNumber tblk;
Buffer buf;
Page page;
Size freespace;
vacuum_delay_point();
2004-02-06 20:36:18 +01:00
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
tblk = ItemPointerGetBlockNumber(&vacrel->dead_items->items[index]);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->blkno = tblk;
buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, tblk, RBM_NORMAL,
vacrel->bstrategy);
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
index = lazy_vacuum_heap_page(vacrel, tblk, buf, index, &vmbuffer);
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
/* Now that we've vacuumed the page, record its available space */
page = BufferGetPage(buf);
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
RecordPageWithFreeSpace(vacrel->rel, tblk, freespace);
vacuumed_pages++;
}
/* Clear the block number information */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
{
ReleaseBuffer(vmbuffer);
vmbuffer = InvalidBuffer;
}
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
/*
* We set all LP_DEAD items from the first heap pass to LP_UNUSED during
* the second heap pass. No more, no less.
*/
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
Assert(index > 0);
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
Assert(vacrel->num_index_scans > 1 ||
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
(index == vacrel->lpdead_items &&
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
vacuumed_pages == vacrel->lpdead_item_pages));
ereport(elevel,
2021-08-07 12:09:22 +02:00
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
vacrel->relname, (long long) index, vacuumed_pages),
errdetail_internal("%s", pg_rusage_show(&ru0))));
/* Revert to the previous phase information for error traceback */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
restore_vacuum_error_info(vacrel, &saved_err_info);
}
/*
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
* lazy_vacuum_heap_page() -- free page's LP_DEAD items listed in the
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* vacrel->dead_items array.
*
* Caller must have an exclusive buffer lock on the buffer (though a full
* cleanup lock is also acceptable).
*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* index is an offset into the vacrel->dead_items array for the first listed
* LP_DEAD item on the page. The return value is the first index immediately
* after all LP_DEAD items for the same page in the array.
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
*
* Prior to PostgreSQL 14 there were rare cases where this routine had to set
* tuples with storage to unused. These days it is strictly responsible for
* marking LP_DEAD stub line pointers as unused. This only happens for those
* LP_DEAD items on the page that were determined to be LP_DEAD items back
* when the same page was visited by lazy_scan_prune() (i.e. those whose TID
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* was recorded in the dead_items array at the time).
*/
static int
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
int index, Buffer *vmbuffer)
{
VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
TransactionId visibility_cutoff_xid;
bool all_frozen;
LVSavedErrInfo saved_err_info;
Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41fe). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a03, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 16:49:39 +02:00
Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
/* Update error traceback information */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
update_vacuum_error_info(vacrel, &saved_err_info,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
START_CRIT_SECTION();
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
for (; index < dead_items->num_items; index++)
{
BlockNumber tblk;
OffsetNumber toff;
ItemId itemid;
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
tblk = ItemPointerGetBlockNumber(&dead_items->items[index]);
if (tblk != blkno)
break; /* past end of tuples for this block */
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
toff = ItemPointerGetOffsetNumber(&dead_items->items[index]);
itemid = PageGetItemId(page, toff);
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
Assert(ItemIdIsDead(itemid) && !ItemIdHasStorage(itemid));
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
}
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
Assert(uncnt > 0);
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
/*
* Mark buffer dirty before we write WAL.
*/
MarkBufferDirty(buffer);
/* XLOG stuff */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (RelationNeedsWAL(vacrel->rel))
{
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
xl_heap_vacuum xlrec;
XLogRecPtr recptr;
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
xlrec.nunused = uncnt;
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapVacuum);
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
XLogRegisterBufData(0, (char *) unused, uncnt * sizeof(OffsetNumber));
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VACUUM);
PageSetLSN(page, recptr);
}
/*
* End critical section, so we safely can do visibility tests (which
* possibly need to perform IO and allocate memory!). If we crash now the
* page (including the corresponding vm bit) might not be marked all
* visible, but that's fine. A later vacuum will fix that.
*/
END_CRIT_SECTION();
/*
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
* Now that we have removed the LD_DEAD items from the page, once again
* check if the page has become all-visible. The page is already marked
* dirty, exclusively locked, and, if needed, a full page image has been
Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit dd695979) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 17:49:22 +02:00
* emitted.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
&all_frozen))
PageSetAllVisible(page);
/*
* All the changes to the heap page have been done. If the all-visible
* flag is now set, also set the VM all-visible bit (and, if possible, the
* all-frozen bit) unless this has already been done previously.
*/
if (PageIsAllVisible(page))
{
uint8 flags = 0;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
uint8 vm_status = visibilitymap_get_status(vacrel->rel,
blkno, vmbuffer);
/* Set the VM all-frozen bit to flag, if needed */
if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0)
flags |= VISIBILITYMAP_ALL_VISIBLE;
if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0 && all_frozen)
flags |= VISIBILITYMAP_ALL_FROZEN;
Assert(BufferIsValid(*vmbuffer));
if (flags != 0)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
visibilitymap_set(vacrel->rel, blkno, buffer, InvalidXLogRecPtr,
*vmbuffer, visibility_cutoff_xid, flags);
}
/* Revert to the previous phase information for error traceback */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
restore_vacuum_error_info(vacrel, &saved_err_info);
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
return index;
}
/*
* lazy_check_needs_freeze() -- scan page to see if any tuples
* need to be cleaned to avoid wraparound
*
* Returns true if the page needs to be vacuumed using cleanup lock.
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
* Also returns a flag indicating whether page contains any tuples at all.
*/
static bool
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_check_needs_freeze(Buffer buf, bool *hastup, LVRelState *vacrel)
{
Page page = BufferGetPage(buf);
OffsetNumber offnum,
maxoff;
HeapTupleHeader tupleheader;
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
*hastup = false;
Move page initialization from RelationAddExtraBlocks() to use, take 2. Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
2019-02-03 10:27:19 +01:00
/*
* New and empty pages, obviously, don't contain tuples. We could make
* sure that the page is registered in the FSM, but it doesn't seem worth
* waiting for a cleanup lock just for that, especially because it's
* likely that the pin holder will do so.
*/
if (PageIsNew(page) || PageIsEmpty(page))
return false;
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
ItemId itemid;
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
/* this should match hastup test in count_nondeletable_pages() */
if (ItemIdIsUsed(itemid))
*hastup = true;
/* dead and redirect items never need freezing */
if (!ItemIdIsNormal(itemid))
continue;
tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (heap_tuple_needs_freeze(tupleheader, vacrel->FreezeLimit,
vacrel->MultiXactCutoff, buf))
break;
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->offnum = InvalidOffsetNumber;
return (offnum <= maxoff);
}
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
/*
* Trigger the failsafe to avoid wraparound failure when vacrel table has a
* relfrozenxid and/or relminmxid that is dangerously far in the past.
* Triggering the failsafe makes the ongoing VACUUM bypass any further index
* vacuuming and heap vacuuming. Truncating the heap is also bypassed.
*
* Any remaining work (work that VACUUM cannot just bypass) is typically sped
* up when the failsafe triggers. VACUUM stops applying any cost-based delay
* that it started out with.
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
*
* Returns true when failsafe has been triggered.
*/
static bool
lazy_check_wraparound_failsafe(LVRelState *vacrel)
{
/* Don't warn more than once per VACUUM */
if (vacrel->failsafe_active)
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
return true;
if (unlikely(vacuum_xid_failsafe_check(vacrel->relfrozenxid,
vacrel->relminmxid)))
{
vacrel->failsafe_active = true;
/* Disable index vacuuming, index cleanup, and heap rel truncation */
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
vacrel->do_index_vacuuming = false;
vacrel->do_index_cleanup = false;
vacrel->do_rel_truncate = false;
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
ereport(WARNING,
(errmsg("bypassing nonessential maintenance of table \"%s.%s.%s\" as a failsafe after %d index scans",
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
get_database_name(MyDatabaseId),
vacrel->relnamespace,
vacrel->relname,
vacrel->num_index_scans),
2021-08-07 12:09:22 +02:00
errdetail("The table's relfrozenxid or relminmxid is too far in the past."),
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
errhint("Consider increasing configuration parameter \"maintenance_work_mem\" or \"autovacuum_work_mem\".\n"
"You might also need to consider other ways for VACUUM to keep up with the allocation of transaction IDs.")));
/* Stop applying cost limits from this point on */
VacuumCostActive = false;
VacuumCostBalance = 0;
return true;
}
return false;
}
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (!ParallelVacuumIsActive(vacrel))
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
double reltuples = vacrel->new_rel_tuples;
bool estimated_count =
vacrel->tupcount_pages < vacrel->rel_pages;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
Relation indrel = vacrel->indrels[idx];
IndexBulkDeleteResult *istat = vacrel->indstats[idx];
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->indstats[idx] =
lazy_cleanup_one_index(indrel, istat, reltuples,
estimated_count, vacrel);
}
}
else
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* Outsource everything to parallel variant */
parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
vacrel->num_index_scans,
(vacrel->tupcount_pages < vacrel->rel_pages));
}
}
/*
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
* lazy_vacuum_one_index() -- vacuum index relation.
*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Delete all the index tuples containing a TID collected in
* vacrel->dead_items array. Also update running statistics.
* Exact details depend on index AM's ambulkdelete routine.
*
* reltuples is the number of heap tuples to be passed to the
Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com
2020-08-30 18:21:51 +02:00
* bulkdelete callback. It's always assumed to be estimated.
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* See indexam.sgml for more info.
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
*
* Returns bulk delete stats derived from input stats
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static IndexBulkDeleteResult *
lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
double reltuples, LVRelState *vacrel)
{
IndexVacuumInfo ivinfo;
LVSavedErrInfo saved_err_info;
ivinfo.index = indrel;
ivinfo.analyze_only = false;
ivinfo.report_progress = false;
ivinfo.estimated_count = true;
ivinfo.message_level = elevel;
ivinfo.num_heap_tuples = reltuples;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
ivinfo.strategy = vacrel->bstrategy;
/*
* Update error traceback information.
*
* The index name is saved during this phase and restored immediately
* after this phase. See vacuum_error_callback.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
Assert(vacrel->indname == NULL);
vacrel->indname = pstrdup(RelationGetRelationName(indrel));
update_vacuum_error_info(vacrel, &saved_err_info,
VACUUM_ERRCB_PHASE_VACUUM_INDEX,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
istat = vac_bulkdel_one_index(&ivinfo, istat, (void *) vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
restore_vacuum_error_info(vacrel, &saved_err_info);
pfree(vacrel->indname);
vacrel->indname = NULL;
return istat;
}
/*
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
* lazy_cleanup_one_index() -- do post-vacuum cleanup for index relation.
*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Calls index AM's amvacuumcleanup routine. reltuples is the number
* of heap tuples and estimated_count is true if reltuples is an
* estimated value. See indexam.sgml for more info.
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
*
* Returns bulk delete stats derived from input stats
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
static IndexBulkDeleteResult *
lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
double reltuples, bool estimated_count,
LVRelState *vacrel)
{
IndexVacuumInfo ivinfo;
LVSavedErrInfo saved_err_info;
ivinfo.index = indrel;
ivinfo.analyze_only = false;
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
ivinfo.num_heap_tuples = reltuples;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
ivinfo.strategy = vacrel->bstrategy;
/*
* Update error traceback information.
*
* The index name is saved during this phase and restored immediately
* after this phase. See vacuum_error_callback.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
Assert(vacrel->indname == NULL);
vacrel->indname = pstrdup(RelationGetRelationName(indrel));
update_vacuum_error_info(vacrel, &saved_err_info,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
istat = vac_cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
restore_vacuum_error_info(vacrel, &saved_err_info);
pfree(vacrel->indname);
vacrel->indname = NULL;
return istat;
}
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
/*
* should_attempt_truncation - should we attempt to truncate the heap?
*
* Don't even think about it unless we have a shot at releasing a goodly
* number of pages. Otherwise, the time taken isn't worth it.
*
* Also don't attempt it if wraparound failsafe is in effect. It's hard to
* predict how long lazy_truncate_heap will take. Don't take any chances.
* There is very little chance of truncation working out when the failsafe is
* in effect in any case. lazy_scan_prune makes the optimistic assumption
* that any LP_DEAD items it encounters will always be LP_UNUSED by the time
* we're called.
*
* Also don't attempt it if we are doing early pruning/vacuuming, because a
* scan which cannot find a truncated heap page cannot determine that the
* snapshot is too old to read that page.
*
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
* This is split out so that we can test whether truncation is going to be
* called for before we actually do it. If you change the logic here, be
* careful to depend only on fields that lazy_scan_heap updates on-the-fly.
*/
static bool
should_attempt_truncation(LVRelState *vacrel)
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
{
BlockNumber possibly_freeable;
if (!vacrel->do_rel_truncate || vacrel->failsafe_active)
return false;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION) &&
old_snapshot_threshold < 0)
Avoid useless truncation attempts during VACUUM. VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
2015-12-30 23:13:15 +01:00
return true;
else
return false;
}
/*
* lazy_truncate_heap - try to truncate off any empty pages at the end
*/
static void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
lazy_truncate_heap(LVRelState *vacrel)
{
BlockNumber orig_rel_pages = vacrel->rel_pages;
BlockNumber new_rel_pages;
bool lock_waiter_detected;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
int lock_retry;
/* Report that we are now truncating */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_TRUNCATE);
/*
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
* Loop until no more truncating can be done.
*/
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
do
{
PGRUsage ru0;
pg_rusage_init(&ru0);
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
/*
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
* We need full exclusive lock on the relation in order to do
* truncation. If we can't get it, give up rather than waiting --- we
* don't want to block other backends, and we don't want to deadlock
* (which is quite possible considering we already hold a lower-grade
* lock).
Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.
2011-05-30 23:05:26 +02:00
*/
lock_waiter_detected = false;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
lock_retry = 0;
while (true)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (ConditionalLockRelation(vacrel->rel, AccessExclusiveLock))
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
break;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
/*
* Check for interrupts while trying to (re-)acquire the exclusive
* lock.
*/
CHECK_FOR_INTERRUPTS();
if (++lock_retry > (VACUUM_TRUNCATE_LOCK_TIMEOUT /
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
{
/*
* We failed to establish the lock in the specified number of
* retries. This means we give up truncating.
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
*/
ereport(elevel,
(errmsg("\"%s\": stopping truncate due to conflicting lock request",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relname)));
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
return;
}
(void) WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL,
WAIT_EVENT_VACUUM_TRUNCATE);
ResetLatch(MyLatch);
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
}
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
/*
* Now that we have exclusive lock, look to see if the rel has grown
* whilst we were vacuuming with non-exclusive lock. If so, give up;
* the newly added pages presumably contain non-deletable tuples.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
new_rel_pages = RelationGetNumberOfBlocks(vacrel->rel);
if (new_rel_pages != orig_rel_pages)
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
{
/*
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
* Note: we intentionally don't update vacrel->rel_pages with the
* new rel size here. If we did, it would amount to assuming that
* the new pages are empty, which is unlikely. Leaving the numbers
* alone amounts to assuming that the new pages have the same
* tuple density as existing ones, which is less unlikely.
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
UnlockRelation(vacrel->rel, AccessExclusiveLock);
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
return;
}
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
/*
* Scan backwards from the end to verify that the end pages actually
* contain no tuples. This is *necessary*, not optional, because
* other backends could have added tuples to these pages whilst we
* were vacuuming.
*/
new_rel_pages = count_nondeletable_pages(vacrel, &lock_waiter_detected);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->blkno = new_rel_pages;
if (new_rel_pages >= orig_rel_pages)
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
{
/* can't do anything after all */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
UnlockRelation(vacrel->rel, AccessExclusiveLock);
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
return;
}
/*
* Okay to truncate.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
RelationTruncate(vacrel->rel, new_rel_pages);
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
/*
* We can release the exclusive lock as soon as we have truncated.
* Other backends can't safely access the relation until they have
* processed the smgr invalidation that smgrtruncate sent out ... but
* that should happen as part of standard invalidation processing once
* they acquire lock on the relation.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
UnlockRelation(vacrel->rel, AccessExclusiveLock);
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
/*
* Update statistics. Here, it *is* correct to adjust rel_pages
* without also touching reltuples, since the tuple count wasn't
* changed by the truncation.
*/
vacrel->pages_removed += orig_rel_pages - new_rel_pages;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->rel_pages = new_rel_pages;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
ereport(elevel,
(errmsg("table \"%s\": truncated %u to %u pages",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relname,
orig_rel_pages, new_rel_pages),
errdetail_internal("%s",
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
pg_rusage_show(&ru0))));
orig_rel_pages = new_rel_pages;
} while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
}
/*
* Rescan end pages to verify that they are (still) empty of tuples.
*
* Returns number of nondeletable pages (last nonempty page + 1).
*/
static BlockNumber
count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
{
BlockNumber blkno;
BlockNumber prefetchedUntil;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
instr_time starttime;
/* Initialize the starttime if we check for conflicting lock requests */
INSTR_TIME_SET_CURRENT(starttime);
/*
* Start checking blocks at what we believe relation end to be and move
* backwards. (Strange coding of loop control is needed because blkno is
* unsigned.) To make the scan faster, we prefetch a few blocks at a time
* in forward direction, so that OS-level readahead can kick in.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
blkno = vacrel->rel_pages;
StaticAssertStmt((PREFETCH_SIZE & (PREFETCH_SIZE - 1)) == 0,
"prefetch size must be power of 2");
prefetchedUntil = InvalidBlockNumber;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
while (blkno > vacrel->nonempty_pages)
{
Buffer buf;
Page page;
OffsetNumber offnum,
maxoff;
bool hastup;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
/*
* Check if another process requests a lock on our relation. We are
* holding an AccessExclusiveLock here, so they will be waiting. We
* only do this once per VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL, and we
* only check if that interval has elapsed once every 32 blocks to
* keep the number of system calls and actual shared lock table
* lookups to a minimum.
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
*/
if ((blkno % 32) == 0)
{
instr_time currenttime;
instr_time elapsed;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
INSTR_TIME_SET_CURRENT(currenttime);
elapsed = currenttime;
INSTR_TIME_SUBTRACT(elapsed, starttime);
if ((INSTR_TIME_GET_MICROSEC(elapsed) / 1000)
>= VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL)
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (LockHasWaitersRelation(vacrel->rel, AccessExclusiveLock))
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
{
ereport(elevel,
(errmsg("table \"%s\": suspending truncate due to conflicting lock request",
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->relname)));
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
*lock_waiter_detected = true;
Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
2012-12-11 21:33:08 +01:00
return blkno;
}
starttime = currenttime;
}
}
/*
* We don't insert a vacuum delay point here, because we have an
* exclusive lock on the table which we want to hold for as short a
* time as possible. We still need to check for interrupts however.
*/
CHECK_FOR_INTERRUPTS();
2004-02-06 20:36:18 +01:00
blkno--;
/* If we haven't prefetched this lot yet, do so now. */
if (prefetchedUntil > blkno)
{
BlockNumber prefetchStart;
BlockNumber pblkno;
prefetchStart = blkno & ~(PREFETCH_SIZE - 1);
for (pblkno = prefetchStart; pblkno <= blkno; pblkno++)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
PrefetchBuffer(vacrel->rel, MAIN_FORKNUM, pblkno);
CHECK_FOR_INTERRUPTS();
}
prefetchedUntil = prefetchStart;
}
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
vacrel->bstrategy);
/* In this phase we only need shared access to the buffer */
LockBuffer(buf, BUFFER_LOCK_SHARE);
page = BufferGetPage(buf);
if (PageIsNew(page) || PageIsEmpty(page))
{
UnlockReleaseBuffer(buf);
continue;
}
hastup = false;
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
ItemId itemid;
itemid = PageGetItemId(page, offnum);
/*
* Note: any non-unused item should be taken as a reason to keep
* this page. Even an LP_DEAD item makes truncation unsafe, since
* we must not have cleaned out its index entries.
*/
if (ItemIdIsUsed(itemid))
{
hastup = true;
break; /* can stop scanning */
}
} /* scan along page */
UnlockReleaseBuffer(buf);
/* Done scanning if we found a tuple here */
if (hastup)
return blkno + 1;
}
/*
* If we fall out of the loop, all the previously-thought-to-be-empty
* pages still are; we need not bother to look at the last known-nonempty
* page.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
return vacrel->nonempty_pages;
}
/*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Returns the number of dead TIDs that VACUUM should allocate space to
* store, given a heap rel of size vacrel->rel_pages, and given current
* maintenance_work_mem setting (or current autovacuum_work_mem setting,
* when applicable).
*
* See the comments at the head of this file for rationale.
*/
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
static int
dead_items_max_items(LVRelState *vacrel)
{
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
int64 max_items;
int vac_work_mem = IsAutoVacuumWorkerProcess() &&
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
if (vacrel->nindexes > 0)
{
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
BlockNumber rel_pages = vacrel->rel_pages;
max_items = MAXDEADITEMS(vac_work_mem * 1024L);
max_items = Min(max_items, INT_MAX);
max_items = Min(max_items, MAXDEADITEMS(MaxAllocSize));
/* curious coding here to ensure the multiplication can't overflow */
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
if ((BlockNumber) (max_items / MaxHeapTuplesPerPage) > rel_pages)
max_items = rel_pages * MaxHeapTuplesPerPage;
/* stay sane if small maintenance_work_mem */
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
max_items = Max(max_items, MaxHeapTuplesPerPage);
}
else
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
{
/* One-pass case only stores a single heap page's TIDs at a time */
max_items = MaxHeapTuplesPerPage;
}
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
return (int) max_items;
}
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Also handles parallel initialization as part of allocating dead_items in
* DSM when required.
*/
static void
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
VacDeadItems *dead_items;
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
int max_items;
max_items = dead_items_max_items(vacrel);
Assert(max_items >= MaxHeapTuplesPerPage);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
* least two indexes on a table.
*/
Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit 8523492d, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com
2021-04-07 21:37:45 +02:00
if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
{
/*
* Since parallel workers cannot access data in temporary tables, we
* can't perform parallel vacuum on them.
*/
if (RelationUsesLocalBuffers(vacrel->rel))
{
/*
* Give warning only if the user explicitly tries to perform a
* parallel vacuum on the temporary table.
*/
if (nworkers > 0)
ereport(WARNING,
(errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
vacrel->relname)));
}
else
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
max_items, elevel,
vacrel->bstrategy);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* If parallel mode started, dead_items space is allocated in DSM */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (ParallelVacuumIsActive(vacrel))
{
vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
return;
}
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
}
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
/* Serial VACUUM case */
dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
dead_items->max_items = max_items;
dead_items->num_items = 0;
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
vacrel->dead_items = dead_items;
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
}
/*
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
* Perform cleanup for resources allocated in dead_items_alloc
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
*/
static void
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
dead_items_cleanup(LVRelState *vacrel)
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
{
if (!ParallelVacuumIsActive(vacrel))
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
{
/* Don't bother with pfree here */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
return;
vacuumlazy.c: Rename dead_tuples to dead_items. Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
2021-11-29 18:58:01 +01:00
}
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
/* End parallel mode */
parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
vacrel->pvs = NULL;
}
/*
* Check if every tuple in the given page is visible to all current and future
* transactions. Also return the visibility_cutoff_xid which is the highest
* xmin amongst the visible tuples. Set *all_frozen to true if every tuple
* on this page is frozen.
*
* This is a stripped down version of lazy_scan_prune(). If you change
* anything here, make sure that everything stays in sync. Note that an
* assertion calls us to verify that everybody still agrees. Be sure to avoid
* introducing new side-effects here.
*/
static bool
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid,
bool *all_frozen)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
offnum = OffsetNumberNext(offnum))
{
ItemId itemid;
HeapTupleData tuple;
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
continue;
ItemPointerSet(&(tuple.t_self), blockno, offnum);
/*
* Dead line pointers can have index pointers pointing to them. So
* they can't be treated as visible
*/
if (ItemIdIsDead(itemid))
{
all_visible = false;
*all_frozen = false;
break;
}
Assert(ItemIdIsNormal(itemid));
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
tuple.t_tableOid = RelationGetRelid(vacrel->rel);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->OldestXmin, buf))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
*all_frozen = false;
break;
}
/*
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (!TransactionIdPrecedes(xmin, vacrel->OldestXmin))
{
all_visible = false;
*all_frozen = false;
break;
}
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid))
*visibility_cutoff_xid = xmin;
/* Check whether this tuple is already frozen or not */
if (all_visible && *all_frozen &&
heap_tuple_needs_eventual_freeze(tuple.t_data))
*all_frozen = false;
}
break;
case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
all_visible = false;
*all_frozen = false;
break;
}
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->offnum = InvalidOffsetNumber;
return all_visible;
}
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
static void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
update_index_statistics(LVRelState *vacrel)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
Relation *indrels = vacrel->indrels;
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
Assert(!IsInParallelMode());
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
for (int idx = 0; idx < nindexes; idx++)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
Relation indrel = indrels[idx];
IndexBulkDeleteResult *istat = indstats[idx];
if (istat == NULL || istat->estimated_count)
continue;
/* Update index statistics */
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vac_update_relstats(indrel,
istat->num_pages,
istat->num_index_tuples,
0,
false,
InvalidTransactionId,
InvalidMultiXactId,
false);
}
}
/*
* Error context callback for errors occurring during vacuum. The error
* context messages for index phases should match the messages set in parallel
* vacuum. If you change this function for those phases, change
* parallel_vacuum_error_callback() as well.
*/
static void
vacuum_error_callback(void *arg)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
LVRelState *errinfo = arg;
switch (errinfo->phase)
{
case VACUUM_ERRCB_PHASE_SCAN_HEAP:
if (BlockNumberIsValid(errinfo->blkno))
{
if (OffsetNumberIsValid(errinfo->offnum))
2021-08-07 12:09:22 +02:00
errcontext("while scanning block %u offset %u of relation \"%s.%s\"",
errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
else
errcontext("while scanning block %u of relation \"%s.%s\"",
errinfo->blkno, errinfo->relnamespace, errinfo->relname);
}
else
errcontext("while scanning relation \"%s.%s\"",
errinfo->relnamespace, errinfo->relname);
break;
case VACUUM_ERRCB_PHASE_VACUUM_HEAP:
if (BlockNumberIsValid(errinfo->blkno))
{
if (OffsetNumberIsValid(errinfo->offnum))
2021-08-07 12:09:22 +02:00
errcontext("while vacuuming block %u offset %u of relation \"%s.%s\"",
errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
else
errcontext("while vacuuming block %u of relation \"%s.%s\"",
errinfo->blkno, errinfo->relnamespace, errinfo->relname);
}
else
errcontext("while vacuuming relation \"%s.%s\"",
errinfo->relnamespace, errinfo->relname);
break;
case VACUUM_ERRCB_PHASE_VACUUM_INDEX:
errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
errinfo->indname, errinfo->relnamespace, errinfo->relname);
break;
case VACUUM_ERRCB_PHASE_INDEX_CLEANUP:
errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
errinfo->indname, errinfo->relnamespace, errinfo->relname);
break;
case VACUUM_ERRCB_PHASE_TRUNCATE:
if (BlockNumberIsValid(errinfo->blkno))
errcontext("while truncating relation \"%s.%s\" to %u blocks",
errinfo->relnamespace, errinfo->relname, errinfo->blkno);
break;
case VACUUM_ERRCB_PHASE_UNKNOWN:
default:
return; /* do nothing; the errinfo may not be
* initialized */
}
}
/*
* Updates the information required for vacuum error callback. This also saves
* the current information which can be later restored via restore_vacuum_error_info.
*/
static void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
update_vacuum_error_info(LVRelState *vacrel, LVSavedErrInfo *saved_vacrel,
int phase, BlockNumber blkno, OffsetNumber offnum)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
if (saved_vacrel)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
saved_vacrel->offnum = vacrel->offnum;
saved_vacrel->blkno = vacrel->blkno;
saved_vacrel->phase = vacrel->phase;
}
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->blkno = blkno;
vacrel->offnum = offnum;
vacrel->phase = phase;
}
/*
* Restores the vacuum information saved via a prior call to update_vacuum_error_info.
*/
static void
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel)
{
Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com
2021-04-05 22:21:44 +02:00
vacrel->blkno = saved_vacrel->blkno;
vacrel->offnum = saved_vacrel->offnum;
vacrel->phase = saved_vacrel->phase;
}