Report failure to start a background worker.

When a worker is flagged as BGW_NEVER_RESTART and we fail to start it,
or if it is not marked BGW_NEVER_RESTART but is terminated before
startup succeeds, what BgwHandleStatus should be reported?  The
previous code really hadn't considered this possibility (as indicated
by the comments which ignore it completely) and would typically return
BGWH_NOT_YET_STARTED, but that's not a good answer, because then
there's no way for code using GetBackgroundWorkerPid() to tell the
difference between a worker that has not started but will start
later and a worker that has not started and will never be started.
So, when this case happens, return BGWH_STOPPED instead.  Update the
comments to reflect this.

The preceding fix by itself is insufficient to fix the problem,
because the old code also didn't send a notification to the process
identified in bgw_notify_pid when startup failed.  That might've
been technically correct under the theory that the status of the
worker was BGWH_NOT_YET_STARTED, because the status would indeed not
change when the worker failed to start, but now that we're more
usefully reporting BGWH_STOPPED, a notification is needed.

Without these fixes, code which starts background workers and then
uses the recommended APIs to wait for those background workers to
start would hang indefinitely if the postmaster failed to fork a
worker.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com
This commit is contained in:
Robert Haas 2017-12-06 08:49:30 -05:00
parent 9c64ddd414
commit 28724fd90d
2 changed files with 24 additions and 8 deletions

View File

@ -1034,14 +1034,18 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
* Get the PID of a dynamically-registered background worker.
*
* If the worker is determined to be running, the return value will be
* BGWH_STARTED and *pidp will get the PID of the worker process.
* Otherwise, the return value will be BGWH_NOT_YET_STARTED if the worker
* hasn't been started yet, and BGWH_STOPPED if the worker was previously
* running but is no longer.
* BGWH_STARTED and *pidp will get the PID of the worker process. If the
* postmaster has not yet attempted to start the worker, the return value will
* be BGWH_NOT_YET_STARTED. Otherwise, the return value is BGWH_STOPPED.
*
* In the latter case, the worker may be stopped temporarily (if it is
* configured for automatic restart and exited non-zero) or gone for
* good (if it exited with code 0 or if it is configured not to restart).
* BGWH_STOPPED can indicate either that the worker is temporarily stopped
* (because it is configured for automatic restart and exited non-zero),
* or that the worker is permanently stopped (because it exited with exit
* code 0, or was not configured for automatic restart), or even that the
* worker was unregistered without ever starting (either because startup
* failed and the worker is not configured for automatic restart, or because
* TerminateBackgroundWorker was used before the worker was successfully
* started).
*/
BgwHandleStatus
GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
@ -1066,8 +1070,11 @@ GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
* time, but we assume such changes are atomic. So the value we read
* won't be garbage, but it might be out of date by the time the caller
* examines it (but that's unavoidable anyway).
*
* The in_use flag could be in the process of changing from true to false,
* but if it is already false then it can't change further.
*/
if (handle->generation != slot->generation)
if (handle->generation != slot->generation || !slot->in_use)
pid = 0;
else
pid = slot->pid;

View File

@ -5909,7 +5909,16 @@ maybe_start_bgworkers(void)
{
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
int notify_pid;
notify_pid = rw->rw_worker.bgw_notify_pid;
ForgetBackgroundWorker(&iter);
/* Report worker is gone now. */
if (notify_pid != 0)
kill(notify_pid, SIGUSR1);
continue;
}