Fix some minor postmaster-state-machine issues.

In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based
on pmState.  The restriction is unnecessary (PostmasterStateMachine
should work in any state), not future-proof (since it makes too many
assumptions about why the signal might be sent), and broken even today
because a race condition can make it necessary to respond to the signal
in PM_WAIT_READONLY state.  The race condition seems unlikely, but
if it did happen, a hot-standby postmaster could fail to shut down
after receiving a smart-shutdown request.

In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag
if the fork attempt fails.  Leaving it set allows us to try
again in future iterations of the postmaster idle loop.  (The startup
process would eventually send a fresh request signal, but this change
may allow us to retry the fork sooner.)

Remove an obsolete comment and unnecessary test in
PostmasterStateMachine's handling of PM_SHUTDOWN_2 state.  It's not
possible to have a live walreceiver in that state, and AFAICT has not
been possible since commit 5e85315ea.  This isn't a live bug, but the
false comment is quite confusing to readers.

In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that
we don't uselessly perform stat() calls that we're going to ignore the
results of.

Add some comments clarifying the behavior of MaybeStartWalReceiver;
I very nearly rearranged it in a way that'd reintroduce the race
condition fixed in e5d494d78.  Mea culpa for not commenting that
properly at the time.

Back-patch to all supported branches.  The PMSIGNAL_ADVANCE_STATE_MACHINE
change is the only one of even minor significance, but we might as well
keep this code in sync across branches.

Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
This commit is contained in:
Tom Lane 2019-04-24 14:15:44 -04:00
parent 0a999e1290
commit 0fae846232
1 changed files with 26 additions and 11 deletions

View File

@ -3811,12 +3811,8 @@ PostmasterStateMachine(void)
* dead_end children left. There shouldn't be any regular backends
* left by now anyway; what we're really waiting for is walsenders and
* archiver.
*
* Walreceiver should normally be dead by now, but not when a fast
* shutdown is performed during recovery.
*/
if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0 &&
WalReceiverPID == 0)
if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0)
{
pmState = PM_WAIT_DEAD_END;
}
@ -5213,16 +5209,25 @@ sigusr1_handler(SIGNAL_ARGS)
MaybeStartWalReceiver();
}
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE) &&
(pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_BACKENDS))
/*
* Try to advance postmaster's state machine, if a child requests it.
*
* Be careful about the order of this action relative to sigusr1_handler's
* other actions. Generally, this should be after other actions, in case
* they have effects PostmasterStateMachine would need to know about.
* However, we should do it before the CheckPromoteSignal step, which
* cannot have any (immediate) effect on the state machine, but does
* depend on what state we're in now.
*/
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE))
{
/* Advance postmaster's state machine */
PostmasterStateMachine();
}
if (CheckPromoteSignal() && StartupPID != 0 &&
if (StartupPID != 0 &&
(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY))
pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
CheckPromoteSignal())
{
/* Tell startup process to finish recovery */
signal_child(StartupPID, SIGUSR2);
@ -5518,6 +5523,14 @@ StartAutovacuumWorker(void)
/*
* MaybeStartWalReceiver
* Start the WAL receiver process, if not running and our state allows.
*
* Note: if WalReceiverPID is already nonzero, it might seem that we should
* clear WalReceiverRequested. However, there's a race condition if the
* walreceiver terminates and the startup process immediately requests a new
* one: it's quite possible to get the signal for the request before reaping
* the dead walreceiver process. Better to risk launching an extra
* walreceiver than to miss launching one we need. (The walreceiver code
* has logic to recognize that it should go away if not needed.)
*/
static void
MaybeStartWalReceiver(void)
@ -5528,7 +5541,9 @@ MaybeStartWalReceiver(void)
Shutdown == NoShutdown)
{
WalReceiverPID = StartWalReceiver();
WalReceiverRequested = false;
if (WalReceiverPID != 0)
WalReceiverRequested = false;
/* else leave the flag set, so we'll try again later */
}
}