Prefetch data referenced by the WAL, take II.
Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
This commit is contained in:
parent
9553b4115f
commit
5dc0418fab
|
@ -3657,6 +3657,70 @@ include_dir 'conf.d'
|
||||||
</variablelist>
|
</variablelist>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
|
<sect2 id="runtime-config-wal-recovery">
|
||||||
|
|
||||||
|
<title>Recovery</title>
|
||||||
|
|
||||||
|
<indexterm>
|
||||||
|
<primary>configuration</primary>
|
||||||
|
<secondary>of recovery</secondary>
|
||||||
|
<tertiary>general settings</tertiary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This section describes the settings that apply to recovery in general,
|
||||||
|
affecting crash recovery, streaming replication and archive-based
|
||||||
|
replication.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
|
||||||
|
<variablelist>
|
||||||
|
<varlistentry id="guc-recovery-prefetch" xreflabel="recovery_prefetch">
|
||||||
|
<term><varname>recovery_prefetch</varname> (<type>enum</type>)
|
||||||
|
<indexterm>
|
||||||
|
<primary><varname>recovery_prefetch</varname> configuration parameter</primary>
|
||||||
|
</indexterm>
|
||||||
|
</term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Whether to try to prefetch blocks that are referenced in the WAL that
|
||||||
|
are not yet in the buffer pool, during recovery. Valid values are
|
||||||
|
<literal>off</literal> (the default), <literal>on</literal> and
|
||||||
|
<literal>try</literal>. The setting <literal>try</literal> enables
|
||||||
|
prefetching only if the operating system provides the
|
||||||
|
<function>posix_fadvise</function> function, which is currently used
|
||||||
|
to implement prefetching. Note that some operating systems provide the
|
||||||
|
function, but it doesn't do anything.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Prefetching blocks that will soon be needed can reduce I/O wait times
|
||||||
|
during recovery with some workloads.
|
||||||
|
See also the <xref linkend="guc-wal-decode-buffer-size"/> and
|
||||||
|
<xref linkend="guc-maintenance-io-concurrency"/> settings, which limit
|
||||||
|
prefetching activity.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry id="guc-wal-decode-buffer-size" xreflabel="wal_decode_buffer_size">
|
||||||
|
<term><varname>wal_decode_buffer_size</varname> (<type>integer</type>)
|
||||||
|
<indexterm>
|
||||||
|
<primary><varname>wal_decode_buffer_size</varname> configuration parameter</primary>
|
||||||
|
</indexterm>
|
||||||
|
</term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
A limit on how far ahead the server can look in the WAL, to find
|
||||||
|
blocks to prefetch. If this value is specified without units, it is
|
||||||
|
taken as bytes.
|
||||||
|
The default is 512kB.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
</variablelist>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="runtime-config-wal-archive-recovery">
|
<sect2 id="runtime-config-wal-archive-recovery">
|
||||||
|
|
||||||
<title>Archive Recovery</title>
|
<title>Archive Recovery</title>
|
||||||
|
|
|
@ -328,6 +328,13 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
|
||||||
</entry>
|
</entry>
|
||||||
</row>
|
</row>
|
||||||
|
|
||||||
|
<row>
|
||||||
|
<entry><structname>pg_stat_recovery_prefetch</structname><indexterm><primary>pg_stat_recovery_prefetch</primary></indexterm></entry>
|
||||||
|
<entry>Only one row, showing statistics about blocks prefetched during recovery.
|
||||||
|
See <xref linkend="pg-stat-recovery-prefetch-view"/> for details.
|
||||||
|
</entry>
|
||||||
|
</row>
|
||||||
|
|
||||||
<row>
|
<row>
|
||||||
<entry><structname>pg_stat_subscription</structname><indexterm><primary>pg_stat_subscription</primary></indexterm></entry>
|
<entry><structname>pg_stat_subscription</structname><indexterm><primary>pg_stat_subscription</primary></indexterm></entry>
|
||||||
<entry>At least one row per subscription, showing information about
|
<entry>At least one row per subscription, showing information about
|
||||||
|
@ -2979,6 +2986,78 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
|
||||||
copy of the subscribed tables.
|
copy of the subscribed tables.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<table id="pg-stat-recovery-prefetch-view" xreflabel="pg_stat_recovery_prefetch">
|
||||||
|
<title><structname>pg_stat_recovery_prefetch</structname> View</title>
|
||||||
|
<tgroup cols="3">
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Column</entry>
|
||||||
|
<entry>Type</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>prefetch</structfield></entry>
|
||||||
|
<entry><type>bigint</type></entry>
|
||||||
|
<entry>Number of blocks prefetched because they were not in the buffer pool</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>hit</structfield></entry>
|
||||||
|
<entry><type>bigint</type></entry>
|
||||||
|
<entry>Number of blocks not prefetched because they were already in the buffer pool</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>skip_init</structfield></entry>
|
||||||
|
<entry><type>bigint</type></entry>
|
||||||
|
<entry>Number of blocks not prefetched because they would be zero-initialized</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>skip_new</structfield></entry>
|
||||||
|
<entry><type>bigint</type></entry>
|
||||||
|
<entry>Number of blocks not prefetched because they didn't exist yet</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>skip_fpw</structfield></entry>
|
||||||
|
<entry><type>bigint</type></entry>
|
||||||
|
<entry>Number of blocks not prefetched because a full page image was included in the WAL</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>skip_rep</structfield></entry>
|
||||||
|
<entry><type>bigint</type></entry>
|
||||||
|
<entry>Number of blocks not prefetched because they were already recently prefetched</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>wal_distance</structfield></entry>
|
||||||
|
<entry><type>integer</type></entry>
|
||||||
|
<entry>How many bytes ahead the prefetcher is looking</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>block_distance</structfield></entry>
|
||||||
|
<entry><type>integer</type></entry>
|
||||||
|
<entry>How many blocks ahead the prefetcher is looking</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><structfield>io_depth</structfield></entry>
|
||||||
|
<entry><type>integer</type></entry>
|
||||||
|
<entry>How many prefetches have been initiated but are not yet known to have completed</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <structname>pg_stat_recovery_prefetch</structname> view will contain
|
||||||
|
only one row. It is filled with nulls if recovery has not run or
|
||||||
|
<xref linkend="guc-recovery-prefetch"/> is not enabled. The
|
||||||
|
columns <structfield>wal_distance</structfield>,
|
||||||
|
<structfield>block_distance</structfield>
|
||||||
|
and <structfield>io_depth</structfield> show current values, and the
|
||||||
|
other columns show cumulative counters that can be reset
|
||||||
|
with the <function>pg_stat_reset_shared</function> function.
|
||||||
|
</para>
|
||||||
|
|
||||||
<table id="pg-stat-subscription" xreflabel="pg_stat_subscription">
|
<table id="pg-stat-subscription" xreflabel="pg_stat_subscription">
|
||||||
<title><structname>pg_stat_subscription</structname> View</title>
|
<title><structname>pg_stat_subscription</structname> View</title>
|
||||||
<tgroup cols="1">
|
<tgroup cols="1">
|
||||||
|
@ -5199,8 +5278,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
|
||||||
all the counters shown in
|
all the counters shown in
|
||||||
the <structname>pg_stat_bgwriter</structname>
|
the <structname>pg_stat_bgwriter</structname>
|
||||||
view, <literal>archiver</literal> to reset all the counters shown in
|
view, <literal>archiver</literal> to reset all the counters shown in
|
||||||
the <structname>pg_stat_archiver</structname> view or <literal>wal</literal>
|
the <structname>pg_stat_archiver</structname> view,
|
||||||
to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
|
<literal>wal</literal> to reset all the counters shown in the
|
||||||
|
<structname>pg_stat_wal</structname> view or
|
||||||
|
<literal>recovery_prefetch</literal> to reset all the counters shown
|
||||||
|
in the <structname>pg_stat_recovery_prefetch</structname> view.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
This function is restricted to superusers by default, but other users
|
This function is restricted to superusers by default, but other users
|
||||||
|
|
|
@ -803,6 +803,18 @@
|
||||||
counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
|
counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
|
||||||
in <structname>pg_stat_wal</structname>, respectively.
|
in <structname>pg_stat_wal</structname>, respectively.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <xref linkend="guc-recovery-prefetch"/> parameter can be used to reduce
|
||||||
|
I/O wait times during recovery by instructing the kernel to initiate reads
|
||||||
|
of disk blocks that will soon be needed but are not currently in
|
||||||
|
<productname>PostgreSQL</productname>'s buffer pool.
|
||||||
|
The <xref linkend="guc-maintenance-io-concurrency"/> and
|
||||||
|
<xref linkend="guc-wal-decode-buffer-size"/> settings limit prefetching
|
||||||
|
concurrency and distance, respectively. By default, it is set to
|
||||||
|
<literal>try</literal>, which enabled the feature on systems where
|
||||||
|
<function>posix_fadvise</function> is available.
|
||||||
|
</para>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
<sect1 id="wal-internals">
|
<sect1 id="wal-internals">
|
||||||
|
|
|
@ -31,6 +31,7 @@ OBJS = \
|
||||||
xlogarchive.o \
|
xlogarchive.o \
|
||||||
xlogfuncs.o \
|
xlogfuncs.o \
|
||||||
xloginsert.o \
|
xloginsert.o \
|
||||||
|
xlogprefetcher.o \
|
||||||
xlogreader.o \
|
xlogreader.o \
|
||||||
xlogrecovery.o \
|
xlogrecovery.o \
|
||||||
xlogutils.o
|
xlogutils.o
|
||||||
|
|
|
@ -59,6 +59,7 @@
|
||||||
#include "access/xlog_internal.h"
|
#include "access/xlog_internal.h"
|
||||||
#include "access/xlogarchive.h"
|
#include "access/xlogarchive.h"
|
||||||
#include "access/xloginsert.h"
|
#include "access/xloginsert.h"
|
||||||
|
#include "access/xlogprefetcher.h"
|
||||||
#include "access/xlogreader.h"
|
#include "access/xlogreader.h"
|
||||||
#include "access/xlogrecovery.h"
|
#include "access/xlogrecovery.h"
|
||||||
#include "access/xlogutils.h"
|
#include "access/xlogutils.h"
|
||||||
|
@ -133,6 +134,7 @@ int CommitDelay = 0; /* precommit delay in microseconds */
|
||||||
int CommitSiblings = 5; /* # concurrent xacts needed to sleep */
|
int CommitSiblings = 5; /* # concurrent xacts needed to sleep */
|
||||||
int wal_retrieve_retry_interval = 5000;
|
int wal_retrieve_retry_interval = 5000;
|
||||||
int max_slot_wal_keep_size_mb = -1;
|
int max_slot_wal_keep_size_mb = -1;
|
||||||
|
int wal_decode_buffer_size = 512 * 1024;
|
||||||
bool track_wal_io_timing = false;
|
bool track_wal_io_timing = false;
|
||||||
|
|
||||||
#ifdef WAL_DEBUG
|
#ifdef WAL_DEBUG
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1727,6 +1727,8 @@ DecodeXLogRecord(XLogReaderState *state,
|
||||||
blk->has_image = ((fork_flags & BKPBLOCK_HAS_IMAGE) != 0);
|
blk->has_image = ((fork_flags & BKPBLOCK_HAS_IMAGE) != 0);
|
||||||
blk->has_data = ((fork_flags & BKPBLOCK_HAS_DATA) != 0);
|
blk->has_data = ((fork_flags & BKPBLOCK_HAS_DATA) != 0);
|
||||||
|
|
||||||
|
blk->prefetch_buffer = InvalidBuffer;
|
||||||
|
|
||||||
COPY_HEADER_FIELD(&blk->data_len, sizeof(uint16));
|
COPY_HEADER_FIELD(&blk->data_len, sizeof(uint16));
|
||||||
/* cross-check that the HAS_DATA flag is set iff data_length > 0 */
|
/* cross-check that the HAS_DATA flag is set iff data_length > 0 */
|
||||||
if (blk->has_data && blk->data_len == 0)
|
if (blk->has_data && blk->data_len == 0)
|
||||||
|
@ -1925,14 +1927,29 @@ err:
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Returns information about the block that a block reference refers to.
|
* Returns information about the block that a block reference refers to.
|
||||||
*
|
* See XLogRecGetBlockTagExtended().
|
||||||
* If the WAL record contains a block reference with the given ID, *rnode,
|
|
||||||
* *forknum, and *blknum are filled in (if not NULL), and returns true.
|
|
||||||
* Otherwise returns false.
|
|
||||||
*/
|
*/
|
||||||
bool
|
bool
|
||||||
XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
|
XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
|
||||||
RelFileNode *rnode, ForkNumber *forknum, BlockNumber *blknum)
|
RelFileNode *rnode, ForkNumber *forknum, BlockNumber *blknum)
|
||||||
|
{
|
||||||
|
return XLogRecGetBlockTagExtended(record, block_id, rnode, forknum, blknum,
|
||||||
|
NULL);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Returns information about the block that a block reference refers to,
|
||||||
|
* optionally including the buffer that the block may already be in.
|
||||||
|
*
|
||||||
|
* If the WAL record contains a block reference with the given ID, *rnode,
|
||||||
|
* *forknum, *blknum and *prefetch_buffer are filled in (if not NULL), and
|
||||||
|
* returns true. Otherwise returns false.
|
||||||
|
*/
|
||||||
|
bool
|
||||||
|
XLogRecGetBlockTagExtended(XLogReaderState *record, uint8 block_id,
|
||||||
|
RelFileNode *rnode, ForkNumber *forknum,
|
||||||
|
BlockNumber *blknum,
|
||||||
|
Buffer *prefetch_buffer)
|
||||||
{
|
{
|
||||||
DecodedBkpBlock *bkpb;
|
DecodedBkpBlock *bkpb;
|
||||||
|
|
||||||
|
@ -1947,6 +1964,8 @@ XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
|
||||||
*forknum = bkpb->forknum;
|
*forknum = bkpb->forknum;
|
||||||
if (blknum)
|
if (blknum)
|
||||||
*blknum = bkpb->blkno;
|
*blknum = bkpb->blkno;
|
||||||
|
if (prefetch_buffer)
|
||||||
|
*prefetch_buffer = bkpb->prefetch_buffer;
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -36,6 +36,7 @@
|
||||||
#include "access/xact.h"
|
#include "access/xact.h"
|
||||||
#include "access/xlog_internal.h"
|
#include "access/xlog_internal.h"
|
||||||
#include "access/xlogarchive.h"
|
#include "access/xlogarchive.h"
|
||||||
|
#include "access/xlogprefetcher.h"
|
||||||
#include "access/xlogreader.h"
|
#include "access/xlogreader.h"
|
||||||
#include "access/xlogrecovery.h"
|
#include "access/xlogrecovery.h"
|
||||||
#include "access/xlogutils.h"
|
#include "access/xlogutils.h"
|
||||||
|
@ -183,6 +184,9 @@ static bool doRequestWalReceiverReply;
|
||||||
/* XLogReader object used to parse the WAL records */
|
/* XLogReader object used to parse the WAL records */
|
||||||
static XLogReaderState *xlogreader = NULL;
|
static XLogReaderState *xlogreader = NULL;
|
||||||
|
|
||||||
|
/* XLogPrefetcher object used to consume WAL records with read-ahead */
|
||||||
|
static XLogPrefetcher *xlogprefetcher = NULL;
|
||||||
|
|
||||||
/* Parameters passed down from ReadRecord to the XLogPageRead callback. */
|
/* Parameters passed down from ReadRecord to the XLogPageRead callback. */
|
||||||
typedef struct XLogPageReadPrivate
|
typedef struct XLogPageReadPrivate
|
||||||
{
|
{
|
||||||
|
@ -404,18 +408,21 @@ static void recoveryPausesHere(bool endOfRecovery);
|
||||||
static bool recoveryApplyDelay(XLogReaderState *record);
|
static bool recoveryApplyDelay(XLogReaderState *record);
|
||||||
static void ConfirmRecoveryPaused(void);
|
static void ConfirmRecoveryPaused(void);
|
||||||
|
|
||||||
static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
|
static XLogRecord *ReadRecord(XLogPrefetcher *xlogprefetcher,
|
||||||
int emode, bool fetching_ckpt, TimeLineID replayTLI);
|
int emode, bool fetching_ckpt,
|
||||||
|
TimeLineID replayTLI);
|
||||||
|
|
||||||
static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
|
static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
|
||||||
int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
|
int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
|
||||||
static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
static XLogPageReadResult WaitForWALToBecomeAvailable(XLogRecPtr RecPtr,
|
||||||
bool fetching_ckpt,
|
bool randAccess,
|
||||||
XLogRecPtr tliRecPtr,
|
bool fetching_ckpt,
|
||||||
TimeLineID replayTLI,
|
XLogRecPtr tliRecPtr,
|
||||||
XLogRecPtr replayLSN);
|
TimeLineID replayTLI,
|
||||||
|
XLogRecPtr replayLSN,
|
||||||
|
bool nonblocking);
|
||||||
static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
|
static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
|
||||||
static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr,
|
static XLogRecord *ReadCheckpointRecord(XLogPrefetcher *xlogprefetcher, XLogRecPtr RecPtr,
|
||||||
int whichChkpt, bool report, TimeLineID replayTLI);
|
int whichChkpt, bool report, TimeLineID replayTLI);
|
||||||
static bool rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN);
|
static bool rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN);
|
||||||
static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
|
static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
|
||||||
|
@ -561,6 +568,15 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
|
||||||
errdetail("Failed while allocating a WAL reading processor.")));
|
errdetail("Failed while allocating a WAL reading processor.")));
|
||||||
xlogreader->system_identifier = ControlFile->system_identifier;
|
xlogreader->system_identifier = ControlFile->system_identifier;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Set the WAL decode buffer size. This limits how far ahead we can read
|
||||||
|
* in the WAL.
|
||||||
|
*/
|
||||||
|
XLogReaderSetDecodeBuffer(xlogreader, NULL, wal_decode_buffer_size);
|
||||||
|
|
||||||
|
/* Create a WAL prefetcher. */
|
||||||
|
xlogprefetcher = XLogPrefetcherAllocate(xlogreader);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Allocate two page buffers dedicated to WAL consistency checks. We do
|
* Allocate two page buffers dedicated to WAL consistency checks. We do
|
||||||
* it this way, rather than just making static arrays, for two reasons:
|
* it this way, rather than just making static arrays, for two reasons:
|
||||||
|
@ -589,7 +605,8 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
|
||||||
* When a backup_label file is present, we want to roll forward from
|
* When a backup_label file is present, we want to roll forward from
|
||||||
* the checkpoint it identifies, rather than using pg_control.
|
* the checkpoint it identifies, rather than using pg_control.
|
||||||
*/
|
*/
|
||||||
record = ReadCheckpointRecord(xlogreader, CheckPointLoc, 0, true, CheckPointTLI);
|
record = ReadCheckpointRecord(xlogprefetcher, CheckPointLoc, 0, true,
|
||||||
|
CheckPointTLI);
|
||||||
if (record != NULL)
|
if (record != NULL)
|
||||||
{
|
{
|
||||||
memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
|
memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
|
||||||
|
@ -607,8 +624,8 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
|
||||||
*/
|
*/
|
||||||
if (checkPoint.redo < CheckPointLoc)
|
if (checkPoint.redo < CheckPointLoc)
|
||||||
{
|
{
|
||||||
XLogBeginRead(xlogreader, checkPoint.redo);
|
XLogPrefetcherBeginRead(xlogprefetcher, checkPoint.redo);
|
||||||
if (!ReadRecord(xlogreader, LOG, false,
|
if (!ReadRecord(xlogprefetcher, LOG, false,
|
||||||
checkPoint.ThisTimeLineID))
|
checkPoint.ThisTimeLineID))
|
||||||
ereport(FATAL,
|
ereport(FATAL,
|
||||||
(errmsg("could not find redo location referenced by checkpoint record"),
|
(errmsg("could not find redo location referenced by checkpoint record"),
|
||||||
|
@ -727,7 +744,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
|
||||||
CheckPointTLI = ControlFile->checkPointCopy.ThisTimeLineID;
|
CheckPointTLI = ControlFile->checkPointCopy.ThisTimeLineID;
|
||||||
RedoStartLSN = ControlFile->checkPointCopy.redo;
|
RedoStartLSN = ControlFile->checkPointCopy.redo;
|
||||||
RedoStartTLI = ControlFile->checkPointCopy.ThisTimeLineID;
|
RedoStartTLI = ControlFile->checkPointCopy.ThisTimeLineID;
|
||||||
record = ReadCheckpointRecord(xlogreader, CheckPointLoc, 1, true,
|
record = ReadCheckpointRecord(xlogprefetcher, CheckPointLoc, 1, true,
|
||||||
CheckPointTLI);
|
CheckPointTLI);
|
||||||
if (record != NULL)
|
if (record != NULL)
|
||||||
{
|
{
|
||||||
|
@ -1413,8 +1430,8 @@ FinishWalRecovery(void)
|
||||||
lastRec = XLogRecoveryCtl->lastReplayedReadRecPtr;
|
lastRec = XLogRecoveryCtl->lastReplayedReadRecPtr;
|
||||||
lastRecTLI = XLogRecoveryCtl->lastReplayedTLI;
|
lastRecTLI = XLogRecoveryCtl->lastReplayedTLI;
|
||||||
}
|
}
|
||||||
XLogBeginRead(xlogreader, lastRec);
|
XLogPrefetcherBeginRead(xlogprefetcher, lastRec);
|
||||||
(void) ReadRecord(xlogreader, PANIC, false, lastRecTLI);
|
(void) ReadRecord(xlogprefetcher, PANIC, false, lastRecTLI);
|
||||||
endOfLog = xlogreader->EndRecPtr;
|
endOfLog = xlogreader->EndRecPtr;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1503,6 +1520,9 @@ ShutdownWalRecovery(void)
|
||||||
{
|
{
|
||||||
char recoveryPath[MAXPGPATH];
|
char recoveryPath[MAXPGPATH];
|
||||||
|
|
||||||
|
/* Final update of pg_stat_recovery_prefetch. */
|
||||||
|
XLogPrefetcherComputeStats(xlogprefetcher);
|
||||||
|
|
||||||
/* Shut down xlogreader */
|
/* Shut down xlogreader */
|
||||||
if (readFile >= 0)
|
if (readFile >= 0)
|
||||||
{
|
{
|
||||||
|
@ -1510,6 +1530,7 @@ ShutdownWalRecovery(void)
|
||||||
readFile = -1;
|
readFile = -1;
|
||||||
}
|
}
|
||||||
XLogReaderFree(xlogreader);
|
XLogReaderFree(xlogreader);
|
||||||
|
XLogPrefetcherFree(xlogprefetcher);
|
||||||
|
|
||||||
if (ArchiveRecoveryRequested)
|
if (ArchiveRecoveryRequested)
|
||||||
{
|
{
|
||||||
|
@ -1593,15 +1614,15 @@ PerformWalRecovery(void)
|
||||||
{
|
{
|
||||||
/* back up to find the record */
|
/* back up to find the record */
|
||||||
replayTLI = RedoStartTLI;
|
replayTLI = RedoStartTLI;
|
||||||
XLogBeginRead(xlogreader, RedoStartLSN);
|
XLogPrefetcherBeginRead(xlogprefetcher, RedoStartLSN);
|
||||||
record = ReadRecord(xlogreader, PANIC, false, replayTLI);
|
record = ReadRecord(xlogprefetcher, PANIC, false, replayTLI);
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
/* just have to read next record after CheckPoint */
|
/* just have to read next record after CheckPoint */
|
||||||
Assert(xlogreader->ReadRecPtr == CheckPointLoc);
|
Assert(xlogreader->ReadRecPtr == CheckPointLoc);
|
||||||
replayTLI = CheckPointTLI;
|
replayTLI = CheckPointTLI;
|
||||||
record = ReadRecord(xlogreader, LOG, false, replayTLI);
|
record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (record != NULL)
|
if (record != NULL)
|
||||||
|
@ -1710,7 +1731,7 @@ PerformWalRecovery(void)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Else, try to fetch the next WAL record */
|
/* Else, try to fetch the next WAL record */
|
||||||
record = ReadRecord(xlogreader, LOG, false, replayTLI);
|
record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
|
||||||
} while (record != NULL);
|
} while (record != NULL);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1921,6 +1942,9 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord *record, TimeLineID *repl
|
||||||
*/
|
*/
|
||||||
if (AllowCascadeReplication())
|
if (AllowCascadeReplication())
|
||||||
WalSndWakeup();
|
WalSndWakeup();
|
||||||
|
|
||||||
|
/* Reset the prefetcher. */
|
||||||
|
XLogPrefetchReconfigure();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -2305,7 +2329,8 @@ verifyBackupPageConsistency(XLogReaderState *record)
|
||||||
* temporary page.
|
* temporary page.
|
||||||
*/
|
*/
|
||||||
buf = XLogReadBufferExtended(rnode, forknum, blkno,
|
buf = XLogReadBufferExtended(rnode, forknum, blkno,
|
||||||
RBM_NORMAL_NO_LOG);
|
RBM_NORMAL_NO_LOG,
|
||||||
|
InvalidBuffer);
|
||||||
if (!BufferIsValid(buf))
|
if (!BufferIsValid(buf))
|
||||||
continue;
|
continue;
|
||||||
|
|
||||||
|
@ -2917,17 +2942,18 @@ ConfirmRecoveryPaused(void)
|
||||||
* Attempt to read the next XLOG record.
|
* Attempt to read the next XLOG record.
|
||||||
*
|
*
|
||||||
* Before first call, the reader needs to be positioned to the first record
|
* Before first call, the reader needs to be positioned to the first record
|
||||||
* by calling XLogBeginRead().
|
* by calling XLogPrefetcherBeginRead().
|
||||||
*
|
*
|
||||||
* If no valid record is available, returns NULL, or fails if emode is PANIC.
|
* If no valid record is available, returns NULL, or fails if emode is PANIC.
|
||||||
* (emode must be either PANIC, LOG). In standby mode, retries until a valid
|
* (emode must be either PANIC, LOG). In standby mode, retries until a valid
|
||||||
* record is available.
|
* record is available.
|
||||||
*/
|
*/
|
||||||
static XLogRecord *
|
static XLogRecord *
|
||||||
ReadRecord(XLogReaderState *xlogreader, int emode,
|
ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
|
||||||
bool fetching_ckpt, TimeLineID replayTLI)
|
bool fetching_ckpt, TimeLineID replayTLI)
|
||||||
{
|
{
|
||||||
XLogRecord *record;
|
XLogRecord *record;
|
||||||
|
XLogReaderState *xlogreader = XLogPrefetcherGetReader(xlogprefetcher);
|
||||||
XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
|
XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
|
||||||
|
|
||||||
/* Pass through parameters to XLogPageRead */
|
/* Pass through parameters to XLogPageRead */
|
||||||
|
@ -2943,7 +2969,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
|
||||||
{
|
{
|
||||||
char *errormsg;
|
char *errormsg;
|
||||||
|
|
||||||
record = XLogReadRecord(xlogreader, &errormsg);
|
record = XLogPrefetcherReadRecord(xlogprefetcher, &errormsg);
|
||||||
if (record == NULL)
|
if (record == NULL)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
|
@ -3056,9 +3082,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Read the XLOG page containing RecPtr into readBuf (if not read already).
|
* Read the XLOG page containing RecPtr into readBuf (if not read already).
|
||||||
* Returns number of bytes read, if the page is read successfully, or -1
|
* Returns number of bytes read, if the page is read successfully, or
|
||||||
* in case of errors. When errors occur, they are ereport'ed, but only
|
* XLREAD_FAIL in case of errors. When errors occur, they are ereport'ed, but
|
||||||
* if they have not been previously reported.
|
* only if they have not been previously reported.
|
||||||
|
*
|
||||||
|
* While prefetching, xlogreader->nonblocking may be set. In that case,
|
||||||
|
* returns XLREAD_WOULDBLOCK if we'd otherwise have to wait for more WAL.
|
||||||
*
|
*
|
||||||
* This is responsible for restoring files from archive as needed, as well
|
* This is responsible for restoring files from archive as needed, as well
|
||||||
* as for waiting for the requested WAL record to arrive in standby mode.
|
* as for waiting for the requested WAL record to arrive in standby mode.
|
||||||
|
@ -3066,7 +3095,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
|
||||||
* 'emode' specifies the log level used for reporting "file not found" or
|
* 'emode' specifies the log level used for reporting "file not found" or
|
||||||
* "end of WAL" situations in archive recovery, or in standby mode when a
|
* "end of WAL" situations in archive recovery, or in standby mode when a
|
||||||
* trigger file is found. If set to WARNING or below, XLogPageRead() returns
|
* trigger file is found. If set to WARNING or below, XLogPageRead() returns
|
||||||
* false in those situations, on higher log levels the ereport() won't
|
* XLREAD_FAIL in those situations, on higher log levels the ereport() won't
|
||||||
* return.
|
* return.
|
||||||
*
|
*
|
||||||
* In standby mode, if after a successful return of XLogPageRead() the
|
* In standby mode, if after a successful return of XLogPageRead() the
|
||||||
|
@ -3125,20 +3154,31 @@ retry:
|
||||||
(readSource == XLOG_FROM_STREAM &&
|
(readSource == XLOG_FROM_STREAM &&
|
||||||
flushedUpto < targetPagePtr + reqLen))
|
flushedUpto < targetPagePtr + reqLen))
|
||||||
{
|
{
|
||||||
if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
|
if (readFile >= 0 &&
|
||||||
private->randAccess,
|
xlogreader->nonblocking &&
|
||||||
private->fetching_ckpt,
|
readSource == XLOG_FROM_STREAM &&
|
||||||
targetRecPtr,
|
flushedUpto < targetPagePtr + reqLen)
|
||||||
private->replayTLI,
|
return XLREAD_WOULDBLOCK;
|
||||||
xlogreader->EndRecPtr))
|
|
||||||
{
|
|
||||||
if (readFile >= 0)
|
|
||||||
close(readFile);
|
|
||||||
readFile = -1;
|
|
||||||
readLen = 0;
|
|
||||||
readSource = XLOG_FROM_ANY;
|
|
||||||
|
|
||||||
return -1;
|
switch (WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
|
||||||
|
private->randAccess,
|
||||||
|
private->fetching_ckpt,
|
||||||
|
targetRecPtr,
|
||||||
|
private->replayTLI,
|
||||||
|
xlogreader->EndRecPtr,
|
||||||
|
xlogreader->nonblocking))
|
||||||
|
{
|
||||||
|
case XLREAD_WOULDBLOCK:
|
||||||
|
return XLREAD_WOULDBLOCK;
|
||||||
|
case XLREAD_FAIL:
|
||||||
|
if (readFile >= 0)
|
||||||
|
close(readFile);
|
||||||
|
readFile = -1;
|
||||||
|
readLen = 0;
|
||||||
|
readSource = XLOG_FROM_ANY;
|
||||||
|
return XLREAD_FAIL;
|
||||||
|
case XLREAD_SUCCESS:
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -3263,7 +3303,7 @@ next_record_is_invalid:
|
||||||
if (StandbyMode)
|
if (StandbyMode)
|
||||||
goto retry;
|
goto retry;
|
||||||
else
|
else
|
||||||
return -1;
|
return XLREAD_FAIL;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -3292,14 +3332,18 @@ next_record_is_invalid:
|
||||||
* available.
|
* available.
|
||||||
*
|
*
|
||||||
* When the requested record becomes available, the function opens the file
|
* When the requested record becomes available, the function opens the file
|
||||||
* containing it (if not open already), and returns true. When end of standby
|
* containing it (if not open already), and returns XLREAD_SUCCESS. When end
|
||||||
* mode is triggered by the user, and there is no more WAL available, returns
|
* of standby mode is triggered by the user, and there is no more WAL
|
||||||
* false.
|
* available, returns XLREAD_FAIL.
|
||||||
|
*
|
||||||
|
* If nonblocking is true, then give up immediately if we can't satisfy the
|
||||||
|
* request, returning XLREAD_WOULDBLOCK instead of waiting.
|
||||||
*/
|
*/
|
||||||
static bool
|
static XLogPageReadResult
|
||||||
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
bool fetching_ckpt, XLogRecPtr tliRecPtr,
|
bool fetching_ckpt, XLogRecPtr tliRecPtr,
|
||||||
TimeLineID replayTLI, XLogRecPtr replayLSN)
|
TimeLineID replayTLI, XLogRecPtr replayLSN,
|
||||||
|
bool nonblocking)
|
||||||
{
|
{
|
||||||
static TimestampTz last_fail_time = 0;
|
static TimestampTz last_fail_time = 0;
|
||||||
TimestampTz now;
|
TimestampTz now;
|
||||||
|
@ -3353,6 +3397,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
*/
|
*/
|
||||||
if (lastSourceFailed)
|
if (lastSourceFailed)
|
||||||
{
|
{
|
||||||
|
/*
|
||||||
|
* Don't allow any retry loops to occur during nonblocking
|
||||||
|
* readahead. Let the caller process everything that has been
|
||||||
|
* decoded already first.
|
||||||
|
*/
|
||||||
|
if (nonblocking)
|
||||||
|
return XLREAD_WOULDBLOCK;
|
||||||
|
|
||||||
switch (currentSource)
|
switch (currentSource)
|
||||||
{
|
{
|
||||||
case XLOG_FROM_ARCHIVE:
|
case XLOG_FROM_ARCHIVE:
|
||||||
|
@ -3367,7 +3419,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
if (StandbyMode && CheckForStandbyTrigger())
|
if (StandbyMode && CheckForStandbyTrigger())
|
||||||
{
|
{
|
||||||
XLogShutdownWalRcv();
|
XLogShutdownWalRcv();
|
||||||
return false;
|
return XLREAD_FAIL;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -3375,7 +3427,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
* and pg_wal.
|
* and pg_wal.
|
||||||
*/
|
*/
|
||||||
if (!StandbyMode)
|
if (!StandbyMode)
|
||||||
return false;
|
return XLREAD_FAIL;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Move to XLOG_FROM_STREAM state, and set to start a
|
* Move to XLOG_FROM_STREAM state, and set to start a
|
||||||
|
@ -3519,7 +3571,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
|
currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
|
||||||
currentSource);
|
currentSource);
|
||||||
if (readFile >= 0)
|
if (readFile >= 0)
|
||||||
return true; /* success! */
|
return XLREAD_SUCCESS; /* success! */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Nope, not found in archive or pg_wal.
|
* Nope, not found in archive or pg_wal.
|
||||||
|
@ -3674,11 +3726,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
/* just make sure source info is correct... */
|
/* just make sure source info is correct... */
|
||||||
readSource = XLOG_FROM_STREAM;
|
readSource = XLOG_FROM_STREAM;
|
||||||
XLogReceiptSource = XLOG_FROM_STREAM;
|
XLogReceiptSource = XLOG_FROM_STREAM;
|
||||||
return true;
|
return XLREAD_SUCCESS;
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* In nonblocking mode, return rather than sleeping. */
|
||||||
|
if (nonblocking)
|
||||||
|
return XLREAD_WOULDBLOCK;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Data not here yet. Check for trigger, then wait for
|
* Data not here yet. Check for trigger, then wait for
|
||||||
* walreceiver to wake us up when new WAL arrives.
|
* walreceiver to wake us up when new WAL arrives.
|
||||||
|
@ -3686,13 +3742,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
if (CheckForStandbyTrigger())
|
if (CheckForStandbyTrigger())
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* Note that we don't "return false" immediately here.
|
* Note that we don't return XLREAD_FAIL immediately
|
||||||
* After being triggered, we still want to replay all
|
* here. After being triggered, we still want to
|
||||||
* the WAL that was already streamed. It's in pg_wal
|
* replay all the WAL that was already streamed. It's
|
||||||
* now, so we just treat this as a failure, and the
|
* in pg_wal now, so we just treat this as a failure,
|
||||||
* state machine will move on to replay the streamed
|
* and the state machine will move on to replay the
|
||||||
* WAL from pg_wal, and then recheck the trigger and
|
* streamed WAL from pg_wal, and then recheck the
|
||||||
* exit replay.
|
* trigger and exit replay.
|
||||||
*/
|
*/
|
||||||
lastSourceFailed = true;
|
lastSourceFailed = true;
|
||||||
break;
|
break;
|
||||||
|
@ -3711,6 +3767,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
streaming_reply_sent = true;
|
streaming_reply_sent = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Update pg_stat_recovery_prefetch before sleeping. */
|
||||||
|
XLogPrefetcherComputeStats(xlogprefetcher);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Wait for more WAL to arrive. Time out after 5 seconds
|
* Wait for more WAL to arrive. Time out after 5 seconds
|
||||||
* to react to a trigger file promptly and to check if the
|
* to react to a trigger file promptly and to check if the
|
||||||
|
@ -3743,7 +3802,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||||
HandleStartupProcInterrupts();
|
HandleStartupProcInterrupts();
|
||||||
}
|
}
|
||||||
|
|
||||||
return false; /* not reached */
|
return XLREAD_FAIL; /* not reached */
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -3788,7 +3847,7 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr)
|
||||||
* 1 for "primary", 0 for "other" (backup_label)
|
* 1 for "primary", 0 for "other" (backup_label)
|
||||||
*/
|
*/
|
||||||
static XLogRecord *
|
static XLogRecord *
|
||||||
ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr,
|
ReadCheckpointRecord(XLogPrefetcher *xlogprefetcher, XLogRecPtr RecPtr,
|
||||||
int whichChkpt, bool report, TimeLineID replayTLI)
|
int whichChkpt, bool report, TimeLineID replayTLI)
|
||||||
{
|
{
|
||||||
XLogRecord *record;
|
XLogRecord *record;
|
||||||
|
@ -3815,8 +3874,8 @@ ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr,
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
XLogBeginRead(xlogreader, RecPtr);
|
XLogPrefetcherBeginRead(xlogprefetcher, RecPtr);
|
||||||
record = ReadRecord(xlogreader, LOG, true, replayTLI);
|
record = ReadRecord(xlogprefetcher, LOG, true, replayTLI);
|
||||||
|
|
||||||
if (record == NULL)
|
if (record == NULL)
|
||||||
{
|
{
|
||||||
|
|
|
@ -22,6 +22,7 @@
|
||||||
#include "access/timeline.h"
|
#include "access/timeline.h"
|
||||||
#include "access/xlogrecovery.h"
|
#include "access/xlogrecovery.h"
|
||||||
#include "access/xlog_internal.h"
|
#include "access/xlog_internal.h"
|
||||||
|
#include "access/xlogprefetcher.h"
|
||||||
#include "access/xlogutils.h"
|
#include "access/xlogutils.h"
|
||||||
#include "miscadmin.h"
|
#include "miscadmin.h"
|
||||||
#include "pgstat.h"
|
#include "pgstat.h"
|
||||||
|
@ -355,11 +356,13 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
|
||||||
RelFileNode rnode;
|
RelFileNode rnode;
|
||||||
ForkNumber forknum;
|
ForkNumber forknum;
|
||||||
BlockNumber blkno;
|
BlockNumber blkno;
|
||||||
|
Buffer prefetch_buffer;
|
||||||
Page page;
|
Page page;
|
||||||
bool zeromode;
|
bool zeromode;
|
||||||
bool willinit;
|
bool willinit;
|
||||||
|
|
||||||
if (!XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blkno))
|
if (!XLogRecGetBlockTagExtended(record, block_id, &rnode, &forknum, &blkno,
|
||||||
|
&prefetch_buffer))
|
||||||
{
|
{
|
||||||
/* Caller specified a bogus block_id */
|
/* Caller specified a bogus block_id */
|
||||||
elog(PANIC, "failed to locate backup block with ID %d", block_id);
|
elog(PANIC, "failed to locate backup block with ID %d", block_id);
|
||||||
|
@ -381,7 +384,8 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
|
||||||
{
|
{
|
||||||
Assert(XLogRecHasBlockImage(record, block_id));
|
Assert(XLogRecHasBlockImage(record, block_id));
|
||||||
*buf = XLogReadBufferExtended(rnode, forknum, blkno,
|
*buf = XLogReadBufferExtended(rnode, forknum, blkno,
|
||||||
get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK);
|
get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK,
|
||||||
|
prefetch_buffer);
|
||||||
page = BufferGetPage(*buf);
|
page = BufferGetPage(*buf);
|
||||||
if (!RestoreBlockImage(record, block_id, page))
|
if (!RestoreBlockImage(record, block_id, page))
|
||||||
elog(ERROR, "failed to restore block image");
|
elog(ERROR, "failed to restore block image");
|
||||||
|
@ -410,7 +414,7 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
*buf = XLogReadBufferExtended(rnode, forknum, blkno, mode);
|
*buf = XLogReadBufferExtended(rnode, forknum, blkno, mode, prefetch_buffer);
|
||||||
if (BufferIsValid(*buf))
|
if (BufferIsValid(*buf))
|
||||||
{
|
{
|
||||||
if (mode != RBM_ZERO_AND_LOCK && mode != RBM_ZERO_AND_CLEANUP_LOCK)
|
if (mode != RBM_ZERO_AND_LOCK && mode != RBM_ZERO_AND_CLEANUP_LOCK)
|
||||||
|
@ -450,6 +454,10 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
|
||||||
* exist, and we don't check for all-zeroes. Thus, no log entry is made
|
* exist, and we don't check for all-zeroes. Thus, no log entry is made
|
||||||
* to imply that the page should be dropped or truncated later.
|
* to imply that the page should be dropped or truncated later.
|
||||||
*
|
*
|
||||||
|
* Optionally, recent_buffer can be used to provide a hint about the location
|
||||||
|
* of the page in the buffer pool; it does not have to be correct, but avoids
|
||||||
|
* a buffer mapping table probe if it is.
|
||||||
|
*
|
||||||
* NB: A redo function should normally not call this directly. To get a page
|
* NB: A redo function should normally not call this directly. To get a page
|
||||||
* to modify, use XLogReadBufferForRedoExtended instead. It is important that
|
* to modify, use XLogReadBufferForRedoExtended instead. It is important that
|
||||||
* all pages modified by a WAL record are registered in the WAL records, or
|
* all pages modified by a WAL record are registered in the WAL records, or
|
||||||
|
@ -457,7 +465,8 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
|
||||||
*/
|
*/
|
||||||
Buffer
|
Buffer
|
||||||
XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
|
XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
|
||||||
BlockNumber blkno, ReadBufferMode mode)
|
BlockNumber blkno, ReadBufferMode mode,
|
||||||
|
Buffer recent_buffer)
|
||||||
{
|
{
|
||||||
BlockNumber lastblock;
|
BlockNumber lastblock;
|
||||||
Buffer buffer;
|
Buffer buffer;
|
||||||
|
@ -465,6 +474,15 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
|
||||||
|
|
||||||
Assert(blkno != P_NEW);
|
Assert(blkno != P_NEW);
|
||||||
|
|
||||||
|
/* Do we have a clue where the buffer might be already? */
|
||||||
|
if (BufferIsValid(recent_buffer) &&
|
||||||
|
mode == RBM_NORMAL &&
|
||||||
|
ReadRecentBuffer(rnode, forknum, blkno, recent_buffer))
|
||||||
|
{
|
||||||
|
buffer = recent_buffer;
|
||||||
|
goto recent_buffer_fast_path;
|
||||||
|
}
|
||||||
|
|
||||||
/* Open the relation at smgr level */
|
/* Open the relation at smgr level */
|
||||||
smgr = smgropen(rnode, InvalidBackendId);
|
smgr = smgropen(rnode, InvalidBackendId);
|
||||||
|
|
||||||
|
@ -523,6 +541,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
recent_buffer_fast_path:
|
||||||
if (mode == RBM_NORMAL)
|
if (mode == RBM_NORMAL)
|
||||||
{
|
{
|
||||||
/* check that page has been initialized */
|
/* check that page has been initialized */
|
||||||
|
|
|
@ -930,6 +930,20 @@ CREATE VIEW pg_stat_wal_receiver AS
|
||||||
FROM pg_stat_get_wal_receiver() s
|
FROM pg_stat_get_wal_receiver() s
|
||||||
WHERE s.pid IS NOT NULL;
|
WHERE s.pid IS NOT NULL;
|
||||||
|
|
||||||
|
CREATE VIEW pg_stat_recovery_prefetch AS
|
||||||
|
SELECT
|
||||||
|
s.stats_reset,
|
||||||
|
s.prefetch,
|
||||||
|
s.hit,
|
||||||
|
s.skip_init,
|
||||||
|
s.skip_new,
|
||||||
|
s.skip_fpw,
|
||||||
|
s.skip_rep,
|
||||||
|
s.wal_distance,
|
||||||
|
s.block_distance,
|
||||||
|
s.io_depth
|
||||||
|
FROM pg_stat_get_recovery_prefetch() s;
|
||||||
|
|
||||||
CREATE VIEW pg_stat_subscription AS
|
CREATE VIEW pg_stat_subscription AS
|
||||||
SELECT
|
SELECT
|
||||||
su.oid AS subid,
|
su.oid AS subid,
|
||||||
|
|
|
@ -649,6 +649,8 @@ ReadRecentBuffer(RelFileNode rnode, ForkNumber forkNum, BlockNumber blockNum,
|
||||||
pg_atomic_write_u32(&bufHdr->state,
|
pg_atomic_write_u32(&bufHdr->state,
|
||||||
buf_state + BUF_USAGECOUNT_ONE);
|
buf_state + BUF_USAGECOUNT_ONE);
|
||||||
|
|
||||||
|
pgBufferUsage.local_blks_hit++;
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -680,6 +682,8 @@ ReadRecentBuffer(RelFileNode rnode, ForkNumber forkNum, BlockNumber blockNum,
|
||||||
else
|
else
|
||||||
PinBuffer_Locked(bufHdr); /* pin for first time */
|
PinBuffer_Locked(bufHdr); /* pin for first time */
|
||||||
|
|
||||||
|
pgBufferUsage.shared_blks_hit++;
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -211,7 +211,8 @@ XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
|
||||||
blkno = fsm_logical_to_physical(addr);
|
blkno = fsm_logical_to_physical(addr);
|
||||||
|
|
||||||
/* If the page doesn't exist already, extend */
|
/* If the page doesn't exist already, extend */
|
||||||
buf = XLogReadBufferExtended(rnode, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR);
|
buf = XLogReadBufferExtended(rnode, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR,
|
||||||
|
InvalidBuffer);
|
||||||
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
|
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
|
||||||
|
|
||||||
page = BufferGetPage(buf);
|
page = BufferGetPage(buf);
|
||||||
|
|
|
@ -22,6 +22,7 @@
|
||||||
#include "access/subtrans.h"
|
#include "access/subtrans.h"
|
||||||
#include "access/syncscan.h"
|
#include "access/syncscan.h"
|
||||||
#include "access/twophase.h"
|
#include "access/twophase.h"
|
||||||
|
#include "access/xlogprefetcher.h"
|
||||||
#include "access/xlogrecovery.h"
|
#include "access/xlogrecovery.h"
|
||||||
#include "commands/async.h"
|
#include "commands/async.h"
|
||||||
#include "miscadmin.h"
|
#include "miscadmin.h"
|
||||||
|
@ -119,6 +120,7 @@ CalculateShmemSize(int *num_semaphores)
|
||||||
size = add_size(size, LockShmemSize());
|
size = add_size(size, LockShmemSize());
|
||||||
size = add_size(size, PredicateLockShmemSize());
|
size = add_size(size, PredicateLockShmemSize());
|
||||||
size = add_size(size, ProcGlobalShmemSize());
|
size = add_size(size, ProcGlobalShmemSize());
|
||||||
|
size = add_size(size, XLogPrefetchShmemSize());
|
||||||
size = add_size(size, XLOGShmemSize());
|
size = add_size(size, XLOGShmemSize());
|
||||||
size = add_size(size, XLogRecoveryShmemSize());
|
size = add_size(size, XLogRecoveryShmemSize());
|
||||||
size = add_size(size, CLOGShmemSize());
|
size = add_size(size, CLOGShmemSize());
|
||||||
|
@ -244,6 +246,7 @@ CreateSharedMemoryAndSemaphores(void)
|
||||||
* Set up xlog, clog, and buffers
|
* Set up xlog, clog, and buffers
|
||||||
*/
|
*/
|
||||||
XLOGShmemInit();
|
XLOGShmemInit();
|
||||||
|
XLogPrefetchShmemInit();
|
||||||
XLogRecoveryShmemInit();
|
XLogRecoveryShmemInit();
|
||||||
CLOGShmemInit();
|
CLOGShmemInit();
|
||||||
CommitTsShmemInit();
|
CommitTsShmemInit();
|
||||||
|
|
|
@ -162,9 +162,11 @@ mdexists(SMgrRelation reln, ForkNumber forkNum)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* Close it first, to ensure that we notice if the fork has been unlinked
|
* Close it first, to ensure that we notice if the fork has been unlinked
|
||||||
* since we opened it.
|
* since we opened it. As an optimization, we can skip that in recovery,
|
||||||
|
* which already closes relations when dropping them.
|
||||||
*/
|
*/
|
||||||
mdclose(reln, forkNum);
|
if (!InRecovery)
|
||||||
|
mdclose(reln, forkNum);
|
||||||
|
|
||||||
return (mdopenfork(reln, forkNum, EXTENSION_RETURN_NULL) != NULL);
|
return (mdopenfork(reln, forkNum, EXTENSION_RETURN_NULL) != NULL);
|
||||||
}
|
}
|
||||||
|
|
|
@ -16,6 +16,7 @@
|
||||||
|
|
||||||
#include "access/htup_details.h"
|
#include "access/htup_details.h"
|
||||||
#include "access/xlog.h"
|
#include "access/xlog.h"
|
||||||
|
#include "access/xlogprefetcher.h"
|
||||||
#include "catalog/pg_authid.h"
|
#include "catalog/pg_authid.h"
|
||||||
#include "catalog/pg_type.h"
|
#include "catalog/pg_type.h"
|
||||||
#include "common/ip.h"
|
#include "common/ip.h"
|
||||||
|
@ -2103,13 +2104,15 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
|
||||||
pgstat_reset_of_kind(PGSTAT_KIND_BGWRITER);
|
pgstat_reset_of_kind(PGSTAT_KIND_BGWRITER);
|
||||||
pgstat_reset_of_kind(PGSTAT_KIND_CHECKPOINTER);
|
pgstat_reset_of_kind(PGSTAT_KIND_CHECKPOINTER);
|
||||||
}
|
}
|
||||||
|
else if (strcmp(target, "recovery_prefetch") == 0)
|
||||||
|
XLogPrefetchResetStats();
|
||||||
else if (strcmp(target, "wal") == 0)
|
else if (strcmp(target, "wal") == 0)
|
||||||
pgstat_reset_of_kind(PGSTAT_KIND_WAL);
|
pgstat_reset_of_kind(PGSTAT_KIND_WAL);
|
||||||
else
|
else
|
||||||
ereport(ERROR,
|
ereport(ERROR,
|
||||||
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
||||||
errmsg("unrecognized reset target: \"%s\"", target),
|
errmsg("unrecognized reset target: \"%s\"", target),
|
||||||
errhint("Target must be \"archiver\", \"bgwriter\", or \"wal\".")));
|
errhint("Target must be \"archiver\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
|
||||||
|
|
||||||
PG_RETURN_VOID();
|
PG_RETURN_VOID();
|
||||||
}
|
}
|
||||||
|
|
|
@ -41,6 +41,7 @@
|
||||||
#include "access/twophase.h"
|
#include "access/twophase.h"
|
||||||
#include "access/xact.h"
|
#include "access/xact.h"
|
||||||
#include "access/xlog_internal.h"
|
#include "access/xlog_internal.h"
|
||||||
|
#include "access/xlogprefetcher.h"
|
||||||
#include "access/xlogrecovery.h"
|
#include "access/xlogrecovery.h"
|
||||||
#include "catalog/namespace.h"
|
#include "catalog/namespace.h"
|
||||||
#include "catalog/objectaccess.h"
|
#include "catalog/objectaccess.h"
|
||||||
|
@ -217,6 +218,7 @@ static bool check_effective_io_concurrency(int *newval, void **extra, GucSource
|
||||||
static bool check_maintenance_io_concurrency(int *newval, void **extra, GucSource source);
|
static bool check_maintenance_io_concurrency(int *newval, void **extra, GucSource source);
|
||||||
static bool check_huge_page_size(int *newval, void **extra, GucSource source);
|
static bool check_huge_page_size(int *newval, void **extra, GucSource source);
|
||||||
static bool check_client_connection_check_interval(int *newval, void **extra, GucSource source);
|
static bool check_client_connection_check_interval(int *newval, void **extra, GucSource source);
|
||||||
|
static void assign_maintenance_io_concurrency(int newval, void *extra);
|
||||||
static bool check_application_name(char **newval, void **extra, GucSource source);
|
static bool check_application_name(char **newval, void **extra, GucSource source);
|
||||||
static void assign_application_name(const char *newval, void *extra);
|
static void assign_application_name(const char *newval, void *extra);
|
||||||
static bool check_cluster_name(char **newval, void **extra, GucSource source);
|
static bool check_cluster_name(char **newval, void **extra, GucSource source);
|
||||||
|
@ -495,6 +497,19 @@ static const struct config_enum_entry huge_pages_options[] = {
|
||||||
{NULL, 0, false}
|
{NULL, 0, false}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
static const struct config_enum_entry recovery_prefetch_options[] = {
|
||||||
|
{"off", RECOVERY_PREFETCH_OFF, false},
|
||||||
|
{"on", RECOVERY_PREFETCH_ON, false},
|
||||||
|
{"try", RECOVERY_PREFETCH_TRY, false},
|
||||||
|
{"true", RECOVERY_PREFETCH_ON, true},
|
||||||
|
{"false", RECOVERY_PREFETCH_OFF, true},
|
||||||
|
{"yes", RECOVERY_PREFETCH_ON, true},
|
||||||
|
{"no", RECOVERY_PREFETCH_OFF, true},
|
||||||
|
{"1", RECOVERY_PREFETCH_ON, true},
|
||||||
|
{"0", RECOVERY_PREFETCH_OFF, true},
|
||||||
|
{NULL, 0, false}
|
||||||
|
};
|
||||||
|
|
||||||
static const struct config_enum_entry force_parallel_mode_options[] = {
|
static const struct config_enum_entry force_parallel_mode_options[] = {
|
||||||
{"off", FORCE_PARALLEL_OFF, false},
|
{"off", FORCE_PARALLEL_OFF, false},
|
||||||
{"on", FORCE_PARALLEL_ON, false},
|
{"on", FORCE_PARALLEL_ON, false},
|
||||||
|
@ -785,6 +800,8 @@ const char *const config_group_names[] =
|
||||||
gettext_noop("Write-Ahead Log / Checkpoints"),
|
gettext_noop("Write-Ahead Log / Checkpoints"),
|
||||||
/* WAL_ARCHIVING */
|
/* WAL_ARCHIVING */
|
||||||
gettext_noop("Write-Ahead Log / Archiving"),
|
gettext_noop("Write-Ahead Log / Archiving"),
|
||||||
|
/* WAL_RECOVERY */
|
||||||
|
gettext_noop("Write-Ahead Log / Recovery"),
|
||||||
/* WAL_ARCHIVE_RECOVERY */
|
/* WAL_ARCHIVE_RECOVERY */
|
||||||
gettext_noop("Write-Ahead Log / Archive Recovery"),
|
gettext_noop("Write-Ahead Log / Archive Recovery"),
|
||||||
/* WAL_RECOVERY_TARGET */
|
/* WAL_RECOVERY_TARGET */
|
||||||
|
@ -2818,6 +2835,17 @@ static struct config_int ConfigureNamesInt[] =
|
||||||
NULL, NULL, NULL
|
NULL, NULL, NULL
|
||||||
},
|
},
|
||||||
|
|
||||||
|
{
|
||||||
|
{"wal_decode_buffer_size", PGC_POSTMASTER, WAL_RECOVERY,
|
||||||
|
gettext_noop("Maximum buffer size for reading ahead in the WAL during recovery."),
|
||||||
|
gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch referenced blocks."),
|
||||||
|
GUC_UNIT_BYTE
|
||||||
|
},
|
||||||
|
&wal_decode_buffer_size,
|
||||||
|
512 * 1024, 64 * 1024, MaxAllocSize,
|
||||||
|
NULL, NULL, NULL
|
||||||
|
},
|
||||||
|
|
||||||
{
|
{
|
||||||
{"wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
|
{"wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
|
||||||
gettext_noop("Sets the size of WAL files held for standby servers."),
|
gettext_noop("Sets the size of WAL files held for standby servers."),
|
||||||
|
@ -3141,7 +3169,8 @@ static struct config_int ConfigureNamesInt[] =
|
||||||
0,
|
0,
|
||||||
#endif
|
#endif
|
||||||
0, MAX_IO_CONCURRENCY,
|
0, MAX_IO_CONCURRENCY,
|
||||||
check_maintenance_io_concurrency, NULL, NULL
|
check_maintenance_io_concurrency, assign_maintenance_io_concurrency,
|
||||||
|
NULL
|
||||||
},
|
},
|
||||||
|
|
||||||
{
|
{
|
||||||
|
@ -5013,6 +5042,16 @@ static struct config_enum ConfigureNamesEnum[] =
|
||||||
NULL, NULL, NULL
|
NULL, NULL, NULL
|
||||||
},
|
},
|
||||||
|
|
||||||
|
{
|
||||||
|
{"recovery_prefetch", PGC_SIGHUP, WAL_RECOVERY,
|
||||||
|
gettext_noop("Prefetch referenced blocks during recovery"),
|
||||||
|
gettext_noop("Look ahead in the WAL to find references to uncached data.")
|
||||||
|
},
|
||||||
|
&recovery_prefetch,
|
||||||
|
RECOVERY_PREFETCH_TRY, recovery_prefetch_options,
|
||||||
|
check_recovery_prefetch, assign_recovery_prefetch, NULL
|
||||||
|
},
|
||||||
|
|
||||||
{
|
{
|
||||||
{"force_parallel_mode", PGC_USERSET, DEVELOPER_OPTIONS,
|
{"force_parallel_mode", PGC_USERSET, DEVELOPER_OPTIONS,
|
||||||
gettext_noop("Forces use of parallel query facilities."),
|
gettext_noop("Forces use of parallel query facilities."),
|
||||||
|
@ -12422,6 +12461,20 @@ check_client_connection_check_interval(int *newval, void **extra, GucSource sour
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
assign_maintenance_io_concurrency(int newval, void *extra)
|
||||||
|
{
|
||||||
|
#ifdef USE_PREFETCH
|
||||||
|
/*
|
||||||
|
* Reconfigure recovery prefetching, because a setting it depends on
|
||||||
|
* changed.
|
||||||
|
*/
|
||||||
|
maintenance_io_concurrency = newval;
|
||||||
|
if (AmStartupProcess())
|
||||||
|
XLogPrefetchReconfigure();
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
static bool
|
static bool
|
||||||
check_application_name(char **newval, void **extra, GucSource source)
|
check_application_name(char **newval, void **extra, GucSource source)
|
||||||
{
|
{
|
||||||
|
|
|
@ -241,6 +241,12 @@
|
||||||
#max_wal_size = 1GB
|
#max_wal_size = 1GB
|
||||||
#min_wal_size = 80MB
|
#min_wal_size = 80MB
|
||||||
|
|
||||||
|
# - Prefetching during recovery -
|
||||||
|
|
||||||
|
#recovery_prefetch = try # prefetch pages referenced in the WAL?
|
||||||
|
#wal_decode_buffer_size = 512kB # lookahead window used for prefetching
|
||||||
|
# (change requires restart)
|
||||||
|
|
||||||
# - Archiving -
|
# - Archiving -
|
||||||
|
|
||||||
#archive_mode = off # enables archiving; off, on, or always
|
#archive_mode = off # enables archiving; off, on, or always
|
||||||
|
|
|
@ -50,6 +50,7 @@ extern bool *wal_consistency_checking;
|
||||||
extern char *wal_consistency_checking_string;
|
extern char *wal_consistency_checking_string;
|
||||||
extern bool log_checkpoints;
|
extern bool log_checkpoints;
|
||||||
extern bool track_wal_io_timing;
|
extern bool track_wal_io_timing;
|
||||||
|
extern int wal_decode_buffer_size;
|
||||||
|
|
||||||
extern int CheckPointSegments;
|
extern int CheckPointSegments;
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,53 @@
|
||||||
|
/*-------------------------------------------------------------------------
|
||||||
|
*
|
||||||
|
* xlogprefetcher.h
|
||||||
|
* Declarations for the recovery prefetching module.
|
||||||
|
*
|
||||||
|
* Portions Copyright (c) 2022, PostgreSQL Global Development Group
|
||||||
|
* Portions Copyright (c) 1994, Regents of the University of California
|
||||||
|
*
|
||||||
|
* IDENTIFICATION
|
||||||
|
* src/include/access/xlogprefetcher.h
|
||||||
|
*-------------------------------------------------------------------------
|
||||||
|
*/
|
||||||
|
#ifndef XLOGPREFETCHER_H
|
||||||
|
#define XLOGPREFETCHER_H
|
||||||
|
|
||||||
|
#include "access/xlogdefs.h"
|
||||||
|
|
||||||
|
/* GUCs */
|
||||||
|
extern int recovery_prefetch;
|
||||||
|
|
||||||
|
/* Possible values for recovery_prefetch */
|
||||||
|
typedef enum
|
||||||
|
{
|
||||||
|
RECOVERY_PREFETCH_OFF,
|
||||||
|
RECOVERY_PREFETCH_ON,
|
||||||
|
RECOVERY_PREFETCH_TRY
|
||||||
|
} RecoveryPrefetchValue;
|
||||||
|
|
||||||
|
struct XLogPrefetcher;
|
||||||
|
typedef struct XLogPrefetcher XLogPrefetcher;
|
||||||
|
|
||||||
|
|
||||||
|
extern void XLogPrefetchReconfigure(void);
|
||||||
|
|
||||||
|
extern size_t XLogPrefetchShmemSize(void);
|
||||||
|
extern void XLogPrefetchShmemInit(void);
|
||||||
|
|
||||||
|
extern void XLogPrefetchResetStats(void);
|
||||||
|
|
||||||
|
extern XLogPrefetcher *XLogPrefetcherAllocate(XLogReaderState *reader);
|
||||||
|
extern void XLogPrefetcherFree(XLogPrefetcher *prefetcher);
|
||||||
|
|
||||||
|
extern XLogReaderState *XLogPrefetcherGetReader(XLogPrefetcher *prefetcher);
|
||||||
|
|
||||||
|
extern void XLogPrefetcherBeginRead(XLogPrefetcher *prefetcher,
|
||||||
|
XLogRecPtr recPtr);
|
||||||
|
|
||||||
|
extern XLogRecord *XLogPrefetcherReadRecord(XLogPrefetcher *prefetcher,
|
||||||
|
char **errmsg);
|
||||||
|
|
||||||
|
extern void XLogPrefetcherComputeStats(XLogPrefetcher *prefetcher);
|
||||||
|
|
||||||
|
#endif
|
|
@ -39,6 +39,7 @@
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#include "access/xlogrecord.h"
|
#include "access/xlogrecord.h"
|
||||||
|
#include "storage/buf.h"
|
||||||
|
|
||||||
/* WALOpenSegment represents a WAL segment being read. */
|
/* WALOpenSegment represents a WAL segment being read. */
|
||||||
typedef struct WALOpenSegment
|
typedef struct WALOpenSegment
|
||||||
|
@ -125,6 +126,9 @@ typedef struct
|
||||||
ForkNumber forknum;
|
ForkNumber forknum;
|
||||||
BlockNumber blkno;
|
BlockNumber blkno;
|
||||||
|
|
||||||
|
/* Prefetching workspace. */
|
||||||
|
Buffer prefetch_buffer;
|
||||||
|
|
||||||
/* copy of the fork_flags field from the XLogRecordBlockHeader */
|
/* copy of the fork_flags field from the XLogRecordBlockHeader */
|
||||||
uint8 flags;
|
uint8 flags;
|
||||||
|
|
||||||
|
@ -430,5 +434,9 @@ extern char *XLogRecGetBlockData(XLogReaderState *record, uint8 block_id, Size *
|
||||||
extern bool XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
|
extern bool XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
|
||||||
RelFileNode *rnode, ForkNumber *forknum,
|
RelFileNode *rnode, ForkNumber *forknum,
|
||||||
BlockNumber *blknum);
|
BlockNumber *blknum);
|
||||||
|
extern bool XLogRecGetBlockTagExtended(XLogReaderState *record, uint8 block_id,
|
||||||
|
RelFileNode *rnode, ForkNumber *forknum,
|
||||||
|
BlockNumber *blknum,
|
||||||
|
Buffer *prefetch_buffer);
|
||||||
|
|
||||||
#endif /* XLOGREADER_H */
|
#endif /* XLOGREADER_H */
|
||||||
|
|
|
@ -84,7 +84,8 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record,
|
||||||
Buffer *buf);
|
Buffer *buf);
|
||||||
|
|
||||||
extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
|
extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
|
||||||
BlockNumber blkno, ReadBufferMode mode);
|
BlockNumber blkno, ReadBufferMode mode,
|
||||||
|
Buffer recent_buffer);
|
||||||
|
|
||||||
extern Relation CreateFakeRelcacheEntry(RelFileNode rnode);
|
extern Relation CreateFakeRelcacheEntry(RelFileNode rnode);
|
||||||
extern void FreeFakeRelcacheEntry(Relation fakerel);
|
extern void FreeFakeRelcacheEntry(Relation fakerel);
|
||||||
|
|
|
@ -53,6 +53,6 @@
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/* yyyymmddN */
|
/* yyyymmddN */
|
||||||
#define CATALOG_VERSION_NO 202204073
|
#define CATALOG_VERSION_NO 202204074
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -5654,6 +5654,13 @@
|
||||||
proargmodes => '{o,o,o,o,o,o,o,o,o}',
|
proargmodes => '{o,o,o,o,o,o,o,o,o}',
|
||||||
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
|
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
|
||||||
prosrc => 'pg_stat_get_wal' },
|
prosrc => 'pg_stat_get_wal' },
|
||||||
|
{ oid => '9085', descr => 'statistics: information about WAL prefetching',
|
||||||
|
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', provolatile => 'v',
|
||||||
|
proretset => 't', prorettype => 'record', proargtypes => '',
|
||||||
|
proallargtypes => '{timestamptz,int8,int8,int8,int8,int8,int8,int4,int4,int4}',
|
||||||
|
proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
|
||||||
|
proargnames => '{stats_reset,prefetch,hit,skip_init,skip_new,skip_fpw,skip_rep,wal_distance,block_distance,io_depth}',
|
||||||
|
prosrc => 'pg_stat_get_recovery_prefetch' },
|
||||||
|
|
||||||
{ oid => '2306', descr => 'statistics: information about SLRU caches',
|
{ oid => '2306', descr => 'statistics: information about SLRU caches',
|
||||||
proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
|
proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
|
||||||
|
|
|
@ -453,4 +453,8 @@ extern void assign_search_path(const char *newval, void *extra);
|
||||||
extern bool check_wal_buffers(int *newval, void **extra, GucSource source);
|
extern bool check_wal_buffers(int *newval, void **extra, GucSource source);
|
||||||
extern void assign_xlog_sync_method(int new_sync_method, void *extra);
|
extern void assign_xlog_sync_method(int new_sync_method, void *extra);
|
||||||
|
|
||||||
|
/* in access/transam/xlogprefetcher.c */
|
||||||
|
extern bool check_recovery_prefetch(int *new_value, void **extra, GucSource source);
|
||||||
|
extern void assign_recovery_prefetch(int new_value, void *extra);
|
||||||
|
|
||||||
#endif /* GUC_H */
|
#endif /* GUC_H */
|
||||||
|
|
|
@ -67,6 +67,7 @@ enum config_group
|
||||||
WAL_SETTINGS,
|
WAL_SETTINGS,
|
||||||
WAL_CHECKPOINTS,
|
WAL_CHECKPOINTS,
|
||||||
WAL_ARCHIVING,
|
WAL_ARCHIVING,
|
||||||
|
WAL_RECOVERY,
|
||||||
WAL_ARCHIVE_RECOVERY,
|
WAL_ARCHIVE_RECOVERY,
|
||||||
WAL_RECOVERY_TARGET,
|
WAL_RECOVERY_TARGET,
|
||||||
REPLICATION_SENDING,
|
REPLICATION_SENDING,
|
||||||
|
|
|
@ -2019,6 +2019,17 @@ pg_stat_progress_vacuum| SELECT s.pid,
|
||||||
s.param7 AS num_dead_tuples
|
s.param7 AS num_dead_tuples
|
||||||
FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
|
FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
|
||||||
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
|
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
|
||||||
|
pg_stat_recovery_prefetch| SELECT s.stats_reset,
|
||||||
|
s.prefetch,
|
||||||
|
s.hit,
|
||||||
|
s.skip_init,
|
||||||
|
s.skip_new,
|
||||||
|
s.skip_fpw,
|
||||||
|
s.skip_rep,
|
||||||
|
s.wal_distance,
|
||||||
|
s.block_distance,
|
||||||
|
s.io_depth
|
||||||
|
FROM pg_stat_get_recovery_prefetch() s(stats_reset, prefetch, hit, skip_init, skip_new, skip_fpw, skip_rep, wal_distance, block_distance, io_depth);
|
||||||
pg_stat_replication| SELECT s.pid,
|
pg_stat_replication| SELECT s.pid,
|
||||||
s.usesysid,
|
s.usesysid,
|
||||||
u.rolname AS usename,
|
u.rolname AS usename,
|
||||||
|
|
|
@ -1421,6 +1421,9 @@ LogicalRepWorker
|
||||||
LogicalRewriteMappingData
|
LogicalRewriteMappingData
|
||||||
LogicalTape
|
LogicalTape
|
||||||
LogicalTapeSet
|
LogicalTapeSet
|
||||||
|
LsnReadQueue
|
||||||
|
LsnReadQueueNextFun
|
||||||
|
LsnReadQueueNextStatus
|
||||||
LtreeGistOptions
|
LtreeGistOptions
|
||||||
LtreeSignature
|
LtreeSignature
|
||||||
MAGIC
|
MAGIC
|
||||||
|
@ -2949,6 +2952,9 @@ XLogPageHeaderData
|
||||||
XLogPageReadCB
|
XLogPageReadCB
|
||||||
XLogPageReadPrivate
|
XLogPageReadPrivate
|
||||||
XLogPageReadResult
|
XLogPageReadResult
|
||||||
|
XLogPrefetcher
|
||||||
|
XLogPrefetcherFilter
|
||||||
|
XLogPrefetchStats
|
||||||
XLogReaderRoutine
|
XLogReaderRoutine
|
||||||
XLogReaderState
|
XLogReaderState
|
||||||
XLogRecData
|
XLogRecData
|
||||||
|
|
Loading…
Reference in New Issue