Block signals while allocating DSM memory.

On Linux, we call posix_fallocate() on shm_open()'d memory to avoid
later potential SIGBUS (see commit 899bd785).

Based on field reports of systems stuck in an EINTR retry loop there,
there, we made it possible to break out of that loop via slightly odd
coding where the CHECK_FOR_INTERRUPTS() call was somewhat removed from
the loop (see commit 422952ee).

On further reflection, that was not a great choice for at least two
reasons:

1.  If interrupts were held, the CHECK_FOR_INTERRUPTS() would do nothing
and the EINTR error would be surfaced to the user.

2.  If EINTR was reported but neither QueryCancelPending nor
ProcDiePending was set, then we'd dutifully retry, but with a bit more
understanding of how posix_fallocate() works, it's now clear that you
can get into a loop that never terminates.  posix_fallocate() is not a
function that can do some of the job and tell you about progress if it's
interrupted, it has to undo what it's done so far and report EINTR, and
if signals keep arriving faster than it can complete (cf recovery
conflict signals), you're stuck.

Therefore, for now, we'll simply block most signals to guarantee
progress.  SIGQUIT is not blocked (see InitPostmasterChild()), because
its expected handler doesn't return, and unblockable signals like
SIGCONT are not expected to arrive at a high rate.  For good measure,
we'll include the ftruncate() call in the blocked region, and add a
retry loop.

Back-patch to all supported releases.

Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Nicola Contu <nicola.contu@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
This commit is contained in:
Thomas Munro 2022-07-13 16:16:07 +12:00
parent 82785effc0
commit 4518c798b2
1 changed files with 22 additions and 13 deletions

View File

@ -62,6 +62,7 @@
#endif
#include "common/file_perm.h"
#include "libpq/pqsignal.h" /* for PG_SETMASK macro */
#include "miscadmin.h"
#include "pgstat.h"
#include "portability/mem.h"
@ -306,14 +307,6 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
shm_unlink(name);
errno = save_errno;
/*
* If we received a query cancel or termination signal, we will have
* EINTR set here. If the caller said that errors are OK here, check
* for interrupts immediately.
*/
if (errno == EINTR && elevel >= ERROR)
CHECK_FOR_INTERRUPTS();
ereport(elevel,
(errcode_for_dynamic_shared_memory(),
errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
@ -361,9 +354,21 @@ static int
dsm_impl_posix_resize(int fd, off_t size)
{
int rc;
int save_errno;
/*
* Block all blockable signals, except SIGQUIT. posix_fallocate() can run
* for quite a long time, and is an all-or-nothing operation. If we
* allowed SIGUSR1 to interrupt us repeatedly (for example, due to recovery
* conflicts), the retry loop might never succeed.
*/
PG_SETMASK(&BlockSig);
/* Truncate (or extend) the file to the requested size. */
rc = ftruncate(fd, size);
do
{
rc = ftruncate(fd, size);
} while (rc < 0 && errno == EINTR);
/*
* On Linux, a shm_open fd is backed by a tmpfs file. After resizing with
@ -377,15 +382,15 @@ dsm_impl_posix_resize(int fd, off_t size)
if (rc == 0)
{
/*
* We may get interrupted. If so, just retry unless there is an
* interrupt pending. This avoids the possibility of looping forever
* if another backend is repeatedly trying to interrupt us.
* We still use a traditional EINTR retry loop to handle SIGCONT.
* posix_fallocate() doesn't restart automatically, and we don't want
* this to fail if you attach a debugger.
*/
pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
do
{
rc = posix_fallocate(fd, 0, size);
} while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
} while (rc == EINTR);
pgstat_report_wait_end();
/*
@ -397,6 +402,10 @@ dsm_impl_posix_resize(int fd, off_t size)
}
#endif /* HAVE_POSIX_FALLOCATE && __linux__ */
save_errno = errno;
PG_SETMASK(&UnBlockSig);
errno = save_errno;
return rc;
}