Improve autoprewarm's handling of early-shutdown scenarios.

Bad things happen if the DBA issues "pg_ctl stop -m fast" before
autoprewarm finishes loading its list of blocks to prewarm.
The current worker process successfully terminates early, but
(if this wasn't the last database with blocks to prewarm) the
leader process will just try to launch another worker for the
next database.  Since the postmaster is now in PM_WAIT_BACKENDS
state, it ignores the launch request, and the leader just sits
until it's killed manually.

This is mostly the fault of our half-baked design for launching
background workers, but a proper fix for that is likely to be
too invasive to be back-patchable.  To ameliorate the situation,
fix apw_load_buffers() to check whether SIGTERM has arrived
just before trying to launch another worker.  That leaves us with
only a very narrow window in each worker launch where SIGTERM
could occur between the launch request and successful worker start.

Another issue is that if the leader process does manage to exit,
it unconditionally rewrites autoprewarm.blocks with only the
blocks currently in shared buffers, thus forgetting any blocks
that we hadn't reached yet while prewarming.  This seems quite
unhelpful, since the next database start will then not have the
expected prewarming benefit.  Fix it to not modify the file if
we shut down before the initial load attempt is complete.

Per bug #16785 from John Thompson.  Back-patch to v11 where
the autoprewarm code was introduced.

Discussion: https://postgr.es/m/16785-c0207d8c67fb5f25@postgresql.org
This commit is contained in:
Tom Lane 2020-12-22 13:23:49 -05:00
parent 08dde1b3dc
commit ff769831e0
1 changed files with 25 additions and 5 deletions

View File

@ -153,6 +153,7 @@ void
autoprewarm_main(Datum main_arg)
{
bool first_time = true;
bool final_dump_allowed = true;
TimestampTz last_dump_time = 0;
/* Establish signal handlers; once that's done, unblock signals. */
@ -193,10 +194,15 @@ autoprewarm_main(Datum main_arg)
* There's not much point in performing a dump immediately after we finish
* preloading; so, if we do end up preloading, consider the last dump time
* to be equal to the current time.
*
* If apw_load_buffers() is terminated early by a shutdown request,
* prevent dumping out our state below the loop, because we'd effectively
* just truncate the saved state to however much we'd managed to preload.
*/
if (first_time)
{
apw_load_buffers();
final_dump_allowed = !ShutdownRequestPending;
last_dump_time = GetCurrentTimestamp();
}
@ -254,7 +260,8 @@ autoprewarm_main(Datum main_arg)
* Dump one last time. We assume this is probably the result of a system
* shutdown, although it's possible that we've merely been terminated.
*/
apw_dump_now(true, true);
if (final_dump_allowed)
apw_dump_now(true, true);
}
/*
@ -387,6 +394,18 @@ apw_load_buffers(void)
if (!have_free_buffer())
break;
/*
* Likewise, don't launch if we've already been told to shut down.
*
* There is a race condition here: if the postmaster has received a
* fast-shutdown signal, but we've not heard about it yet, then the
* postmaster will ignore our worker start request and we'll wait
* forever. However, that's a bug in the general background-worker
* logic, not the fault of this module.
*/
if (ShutdownRequestPending)
break;
/*
* Start a per-database worker to load blocks for this database; this
* function will return once the per-database worker exits.
@ -404,10 +423,11 @@ apw_load_buffers(void)
apw_state->pid_using_dumpfile = InvalidPid;
LWLockRelease(&apw_state->lock);
/* Report our success. */
ereport(LOG,
(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
apw_state->prewarmed_blocks, num_elements)));
/* Report our success, if we were able to finish. */
if (!ShutdownRequestPending)
ereport(LOG,
(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
apw_state->prewarmed_blocks, num_elements)));
}
/*