2004-08-24 01:22:45 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
2003-06-12 00:37:46 +02:00
|
|
|
*
|
2004-08-24 01:22:45 +02:00
|
|
|
* slru.h
|
|
|
|
* Simple LRU buffering for transaction status logfiles
|
2003-06-12 00:37:46 +02:00
|
|
|
*
|
2021-01-02 19:06:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
|
2003-06-12 00:37:46 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/include/access/slru.h
|
2004-08-24 01:22:45 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
2003-06-12 00:37:46 +02:00
|
|
|
*/
|
|
|
|
#ifndef SLRU_H
|
|
|
|
#define SLRU_H
|
|
|
|
|
2007-08-02 00:45:09 +02:00
|
|
|
#include "access/xlogdefs.h"
|
2004-05-31 05:48:10 +02:00
|
|
|
#include "storage/lwlock.h"
|
Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery. Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba. This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool. Rearrange things so that data collected in CheckpointStats
includes SLRU activity.
Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them. (I'm not sure if they were ever needed, but
they aren't now.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 08:49:43 +02:00
|
|
|
#include "storage/sync.h"
|
2003-06-12 00:37:46 +02:00
|
|
|
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Define SLRU segment size. A page is the same BLCKSZ as is used everywhere
|
|
|
|
* else in Postgres. The segment size can be chosen somewhat arbitrarily;
|
|
|
|
* we make it 32 pages by default, or 256Kb, i.e. 1M transactions for CLOG
|
|
|
|
* or 64K transactions for SUBTRANS.
|
|
|
|
*
|
|
|
|
* Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
|
|
|
|
* page numbering also wraps around at 0xFFFFFFFF/xxxx_XACTS_PER_PAGE (where
|
|
|
|
* xxxx is CLOG or SUBTRANS, respectively), and segment numbering at
|
|
|
|
* 0xFFFFFFFF/xxxx_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
|
|
|
|
* take no explicit notice of that fact in slru.c, except when comparing
|
|
|
|
* segment and page numbers in SimpleLruTruncate (see PagePrecedes()).
|
|
|
|
*/
|
|
|
|
#define SLRU_PAGES_PER_SEGMENT 32
|
|
|
|
|
2005-11-05 22:19:47 +01:00
|
|
|
/*
|
|
|
|
* Page status codes. Note that these do not include the "dirty" bit.
|
2017-08-16 06:22:32 +02:00
|
|
|
* page_dirty can be true only in the VALID or WRITE_IN_PROGRESS states;
|
2005-11-05 22:19:47 +01:00
|
|
|
* in the latter case it implies that the page has been re-dirtied since
|
|
|
|
* the write started.
|
|
|
|
*/
|
2004-08-24 01:22:45 +02:00
|
|
|
typedef enum
|
|
|
|
{
|
|
|
|
SLRU_PAGE_EMPTY, /* buffer is not in use */
|
|
|
|
SLRU_PAGE_READ_IN_PROGRESS, /* page is being read in */
|
2005-11-05 22:19:47 +01:00
|
|
|
SLRU_PAGE_VALID, /* page is valid and not being written */
|
2004-08-24 01:22:45 +02:00
|
|
|
SLRU_PAGE_WRITE_IN_PROGRESS /* page is being written out */
|
|
|
|
} SlruPageStatus;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Shared-memory state
|
|
|
|
*/
|
|
|
|
typedef struct SlruSharedData
|
|
|
|
{
|
2014-01-27 17:07:44 +01:00
|
|
|
LWLock *ControlLock;
|
2004-08-24 01:22:45 +02:00
|
|
|
|
2005-12-07 00:08:34 +01:00
|
|
|
/* Number of buffers managed by this SLRU structure */
|
|
|
|
int num_slots;
|
|
|
|
|
2004-08-24 01:22:45 +02:00
|
|
|
/*
|
2005-12-07 00:08:34 +01:00
|
|
|
* Arrays holding info for each buffer slot. Page number is undefined
|
|
|
|
* when status is EMPTY, as is page_lru_count.
|
2004-08-24 01:22:45 +02:00
|
|
|
*/
|
2005-12-07 00:08:34 +01:00
|
|
|
char **page_buffer;
|
|
|
|
SlruPageStatus *page_status;
|
|
|
|
bool *page_dirty;
|
|
|
|
int *page_number;
|
|
|
|
int *page_lru_count;
|
Improve management of SLRU statistics collection.
Instead of re-identifying which statistics bucket to use for a given
SLRU on every counter increment, do it once during shmem initialization.
This saves a fair number of cycles, and there's no real cost because
we could not have a bucket assignment that varies over time or across
backends anyway.
Also, get rid of the ill-considered decision to let pgstat.c pry
directly into SLRU's shared state; it's cleaner just to have slru.c
pass the stats bucket number.
In consequence of these changes, there's no longer any need to store
an SLRU's LWLock tranche info in shared memory, so get rid of that,
making this a net reduction in shmem consumption. (That partly
reverts fe702a7b3.)
This is basically code review for 28cac71bd, so I also cleaned up
some comments, removed a dangling extern declaration, fixed some
things that should be static and/or const, etc.
Discussion: https://postgr.es/m/3618.1589313035@sss.pgh.pa.us
2020-05-13 19:08:12 +02:00
|
|
|
LWLockPadded *buffer_locks;
|
2004-08-24 01:22:45 +02:00
|
|
|
|
2007-08-02 00:45:09 +02:00
|
|
|
/*
|
|
|
|
* Optional array of WAL flush LSNs associated with entries in the SLRU
|
|
|
|
* pages. If not zero/NULL, we must flush WAL before writing pages (true
|
2017-03-17 14:46:58 +01:00
|
|
|
* for pg_xact, false for multixact, pg_subtrans, pg_notify). group_lsn[]
|
2010-02-16 23:34:57 +01:00
|
|
|
* has lsn_groups_per_page entries per buffer slot, each containing the
|
2007-08-02 00:45:09 +02:00
|
|
|
* highest LSN known for a contiguous group of SLRU entries on that slot's
|
|
|
|
* page.
|
|
|
|
*/
|
|
|
|
XLogRecPtr *group_lsn;
|
|
|
|
int lsn_groups_per_page;
|
|
|
|
|
2005-12-06 19:10:06 +01:00
|
|
|
/*----------
|
|
|
|
* We mark a page "most recently used" by setting
|
|
|
|
* page_lru_count[slotno] = ++cur_lru_count;
|
|
|
|
* The oldest page is therefore the one with the highest value of
|
|
|
|
* cur_lru_count - page_lru_count[slotno]
|
|
|
|
* The counts will eventually wrap around, but this calculation still
|
|
|
|
* works as long as no page's age exceeds INT_MAX counts.
|
|
|
|
*----------
|
|
|
|
*/
|
|
|
|
int cur_lru_count;
|
|
|
|
|
2004-08-24 01:22:45 +02:00
|
|
|
/*
|
|
|
|
* latest_page_number is the page number of the current end of the log;
|
|
|
|
* this is not critical data, since we use it only to avoid swapping out
|
|
|
|
* the latest page.
|
|
|
|
*/
|
|
|
|
int latest_page_number;
|
2015-11-12 20:59:09 +01:00
|
|
|
|
Improve management of SLRU statistics collection.
Instead of re-identifying which statistics bucket to use for a given
SLRU on every counter increment, do it once during shmem initialization.
This saves a fair number of cycles, and there's no real cost because
we could not have a bucket assignment that varies over time or across
backends anyway.
Also, get rid of the ill-considered decision to let pgstat.c pry
directly into SLRU's shared state; it's cleaner just to have slru.c
pass the stats bucket number.
In consequence of these changes, there's no longer any need to store
an SLRU's LWLock tranche info in shared memory, so get rid of that,
making this a net reduction in shmem consumption. (That partly
reverts fe702a7b3.)
This is basically code review for 28cac71bd, so I also cleaned up
some comments, removed a dangling extern declaration, fixed some
things that should be static and/or const, etc.
Discussion: https://postgr.es/m/3618.1589313035@sss.pgh.pa.us
2020-05-13 19:08:12 +02:00
|
|
|
/* SLRU's index for statistics purposes (might not be unique) */
|
|
|
|
int slru_stats_idx;
|
2004-08-24 01:22:45 +02:00
|
|
|
} SlruSharedData;
|
|
|
|
|
|
|
|
typedef SlruSharedData *SlruShared;
|
2003-06-12 00:37:46 +02:00
|
|
|
|
2004-05-28 07:13:32 +02:00
|
|
|
/*
|
|
|
|
* SlruCtlData is an unshared structure that points to the active information
|
|
|
|
* in shared memory.
|
|
|
|
*/
|
2003-06-12 00:37:46 +02:00
|
|
|
typedef struct SlruCtlData
|
|
|
|
{
|
2004-05-31 05:48:10 +02:00
|
|
|
SlruShared shared;
|
|
|
|
|
2004-05-28 07:13:32 +02:00
|
|
|
/*
|
Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery. Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba. This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool. Rearrange things so that data collected in CheckpointStats
includes SLRU activity.
Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them. (I'm not sure if they were ever needed, but
they aren't now.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 08:49:43 +02:00
|
|
|
* Which sync handler function to use when handing sync requests over to
|
|
|
|
* the checkpointer. SYNC_HANDLER_NONE to disable fsync (eg pg_notify).
|
2004-05-28 07:13:32 +02:00
|
|
|
*/
|
Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery. Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba. This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool. Rearrange things so that data collected in CheckpointStats
includes SLRU activity.
Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them. (I'm not sure if they were ever needed, but
they aren't now.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 08:49:43 +02:00
|
|
|
SyncRequestHandler sync_handler;
|
2003-08-04 02:43:34 +02:00
|
|
|
|
2004-05-28 07:13:32 +02:00
|
|
|
/*
|
2021-01-16 21:21:35 +01:00
|
|
|
* Decide whether a page is "older" for truncation and as a hint for
|
|
|
|
* evicting pages in LRU order. Return true if every entry of the first
|
|
|
|
* argument is older than every entry of the second argument. Note that
|
|
|
|
* !PagePrecedes(a,b) && !PagePrecedes(b,a) need not imply a==b; it also
|
|
|
|
* arises when some entries are older and some are not. For SLRUs using
|
|
|
|
* SimpleLruTruncate(), this must use modular arithmetic. (For others,
|
|
|
|
* the behavior of this callback has no functional implications.) Use
|
|
|
|
* SlruPagePrecedesUnitTests() in SLRUs meeting its criteria.
|
2004-05-28 07:13:32 +02:00
|
|
|
*/
|
2003-06-12 00:37:46 +02:00
|
|
|
bool (*PagePrecedes) (int, int);
|
|
|
|
|
2004-08-24 01:22:45 +02:00
|
|
|
/*
|
|
|
|
* Dir is set during SimpleLruInit and does not change thereafter. Since
|
|
|
|
* it's always the same, it doesn't need to be in shared memory.
|
|
|
|
*/
|
2005-07-04 06:51:52 +02:00
|
|
|
char Dir[64];
|
2003-06-12 00:37:46 +02:00
|
|
|
} SlruCtlData;
|
2004-05-31 05:48:10 +02:00
|
|
|
|
2003-06-12 00:37:46 +02:00
|
|
|
typedef SlruCtlData *SlruCtl;
|
|
|
|
|
2004-05-31 05:48:10 +02:00
|
|
|
|
2007-08-02 00:45:09 +02:00
|
|
|
extern Size SimpleLruShmemSize(int nslots, int nlsns);
|
|
|
|
extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
|
Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery. Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba. This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool. Rearrange things so that data collected in CheckpointStats
includes SLRU activity.
Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them. (I'm not sure if they were ever needed, but
they aren't now.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 08:49:43 +02:00
|
|
|
LWLock *ctllock, const char *subdir, int tranche_id,
|
|
|
|
SyncRequestHandler sync_handler);
|
2003-06-12 00:37:46 +02:00
|
|
|
extern int SimpleLruZeroPage(SlruCtl ctl, int pageno);
|
2007-08-02 00:45:09 +02:00
|
|
|
extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
|
|
|
|
TransactionId xid);
|
2005-12-06 19:10:06 +01:00
|
|
|
extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
|
|
|
|
TransactionId xid);
|
2010-12-30 16:09:17 +01:00
|
|
|
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
|
Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery. Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba. This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool. Rearrange things so that data collected in CheckpointStats
includes SLRU activity.
Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them. (I'm not sure if they were ever needed, but
they aren't now.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 08:49:43 +02:00
|
|
|
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
|
2021-01-16 21:21:35 +01:00
|
|
|
#ifdef USE_ASSERT_CHECKING
|
|
|
|
extern void SlruPagePrecedesUnitTests(SlruCtl ctl, int per_page);
|
|
|
|
#else
|
|
|
|
#define SlruPagePrecedesUnitTests(ctl, per_page) do {} while (0)
|
|
|
|
#endif
|
2003-06-12 00:37:46 +02:00
|
|
|
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
|
2013-08-19 18:33:07 +02:00
|
|
|
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
|
2011-09-28 16:32:38 +02:00
|
|
|
|
|
|
|
typedef bool (*SlruScanCallback) (SlruCtl ctl, char *filename, int segpage,
|
|
|
|
void *data);
|
|
|
|
extern bool SlruScanDirectory(SlruCtl ctl, SlruScanCallback callback, void *data);
|
Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.
We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.
This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
this has gotten much easier
During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
segments. This happened to be hard or impossible to hit due to the
interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
that doesn't work for offsets that haven't yet been flushed to
disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
of emergency vacuuming. The problem remains in
pg_get_multixact_members(), but that's quite harmless.
For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.
For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound. To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.
A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.
Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
|
|
|
extern void SlruDeleteSegment(SlruCtl ctl, int segno);
|
2011-09-28 16:32:38 +02:00
|
|
|
|
Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery. Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba. This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool. Rearrange things so that data collected in CheckpointStats
includes SLRU activity.
Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them. (I'm not sure if they were ever needed, but
they aren't now.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 08:49:43 +02:00
|
|
|
extern int SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path);
|
|
|
|
|
2011-09-28 16:32:38 +02:00
|
|
|
/* SlruScanDirectory public callbacks */
|
|
|
|
extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
|
|
|
|
int segpage, void *data);
|
|
|
|
extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
|
|
|
|
void *data);
|
2004-07-01 02:52:04 +02:00
|
|
|
|
2003-06-12 00:37:46 +02:00
|
|
|
#endif /* SLRU_H */
|