The postmaster keeps signals blocked everywhere except while waiting for something to happen in ServerLoop(). The code expects that the select(2) will be cancelled with EINTR if an interrupt occurs; without that, followup actions that should be performed by ServerLoop() itself will be delayed. However, some platforms interpret the SA_RESTART signal flag as meaning that they should restart rather than cancel the select(2). Worse yet, some of them restart it with the original timeout delay, meaning that a steady stream of signal interrupts can prevent ServerLoop() from iterating at all if there are no incoming connection requests. Observable symptoms of this, on an affected platform such as HPUX 10, include extremely slow parallel query startup (possibly as much as 30 seconds) and failure to update timestamps on the postmaster's sockets and lockfiles when no new connections arrive for a long time. We can fix this by running the postmaster's signal handlers without SA_RESTART. That would be quite a scary change if the range of code where signals are accepted weren't so tiny, but as it is, it seems safe enough. (Note that postmaster children do, and must, reset all the handlers before unblocking signals; so this change should not affect any child process.) There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point it might be appropriate to revert this patch. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us |
||
---|---|---|
.. | ||
.gitignore | ||
Makefile | ||
README | ||
chklocale.c | ||
crypt.c | ||
dirent.c | ||
dirmod.c | ||
erand48.c | ||
fls.c | ||
fseeko.c | ||
getaddrinfo.c | ||
getopt.c | ||
getopt_long.c | ||
getpeereid.c | ||
getrusage.c | ||
gettimeofday.c | ||
inet_aton.c | ||
inet_net_ntop.c | ||
isinf.c | ||
kill.c | ||
mkdtemp.c | ||
noblock.c | ||
open.c | ||
path.c | ||
pg_crc32c_choose.c | ||
pg_crc32c_sb8.c | ||
pg_crc32c_sse42.c | ||
pg_strong_random.c | ||
pgcheckdir.c | ||
pgmkdirp.c | ||
pgsleep.c | ||
pgstrcasecmp.c | ||
pqsignal.c | ||
pthread-win32.h | ||
qsort.c | ||
qsort_arg.c | ||
quotes.c | ||
random.c | ||
rint.c | ||
snprintf.c | ||
sprompt.c | ||
srandom.c | ||
strerror.c | ||
strlcat.c | ||
strlcpy.c | ||
system.c | ||
tar.c | ||
thread.c | ||
unsetenv.c | ||
win32.ico | ||
win32env.c | ||
win32error.c | ||
win32security.c | ||
win32setlocale.c | ||
win32ver.rc |
README
src/port/README libpgport ========= libpgport must have special behavior. It supplies functions to both libraries and applications. However, there are two complexities: 1) Libraries need to use object files that are compiled with exactly the same flags as the library. libpgport might not use the same flags, so it is necessary to recompile the object files for individual libraries. This is done by removing -lpgport from the link line: # Need to recompile any libpgport object files LIBS := $(filter-out -lpgport, $(LIBS)) and adding infrastructure to recompile the object files: OBJS= execute.o typename.o descriptor.o data.o error.o prepare.o memory.o \ connect.o misc.o path.o exec.o \ $(filter snprintf.o, $(LIBOBJS)) The problem is that there is no testing of which object files need to be added, but missing functions usually show up when linking user applications. 2) For applications, we use -lpgport before -lpq, so the static files from libpgport are linked first. This avoids having applications dependent on symbols that are _used_ by libpq, but not intended to be exported by libpq. libpq's libpgport usage changes over time, so such a dependency is a problem. Windows, Linux, and macOS use an export list to control the symbols exported by libpq.