Don't retry restore_command while reading ahead.

Suppress further attempts to read ahead in the WAL if we run out of
data, until the records already decoded have been replayed.  This
restores the traditional behavior for continuous archive recovery, which
is to retry the failing restore_command only every 5 seconds.  With the
coding in 5dc0418f, we would start retrying every time through the
recovery loop when our WAL decoding window hit the end of the current
segment and we tried to look ahead into a not-yet-available next file.
That was very slow.

Also change the no_readahead_until mechanism to use <= rather than <,
which seems more useful.  Otherwise we'd either get one extra unwanted
retry of restore_command, or we'd need to add 1 to an LSN.

No change in behavior for regular streaming.  That was already limited
by the flushedUpto variable, which won't be updated until we replay what
we have already.

Reported by Andres Freund while analyzing the failure of a TAP test on
build farm animal skink (investigation ongoing but probably due to
otherwise unrelated timing bugs triggered by this slowness magnified by
valgrind).

Discussion: https://postgr.es/m/20220409005910.alw46xqmmgny2sgr%40alap3.anarazel.de
This commit is contained in:
Thomas Munro 2022-04-17 10:22:03 +12:00
parent 4a736a161c
commit acf1dd4234
1 changed files with 8 additions and 3 deletions

View File

@ -487,8 +487,8 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn)
*/
nonblocking = XLogReaderHasQueuedRecordOrError(reader);
/* Certain records act as barriers for all readahead. */
if (nonblocking && replaying_lsn < prefetcher->no_readahead_until)
/* Readahead is disabled until we replay past a certain point. */
if (nonblocking && replaying_lsn <= prefetcher->no_readahead_until)
return LRQ_NEXT_AGAIN;
record = XLogReadAhead(prefetcher->reader, nonblocking);
@ -496,8 +496,13 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn)
{
/*
* We can't read any more, due to an error or lack of data in
* nonblocking mode.
* nonblocking mode. Don't try to read ahead again until
* we've replayed everything already decoded.
*/
if (nonblocking && prefetcher->reader->decode_queue_tail)
prefetcher->no_readahead_until =
prefetcher->reader->decode_queue_tail->lsn;
return LRQ_NEXT_AGAIN;
}