1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* async.c
|
1998-10-06 04:40:09 +02:00
|
|
|
* Asynchronous notification: NOTIFY, LISTEN, UNLISTEN
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2021-01-02 19:06:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/commands/async.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
1998-10-06 04:40:09 +02:00
|
|
|
|
|
|
|
/*-------------------------------------------------------------------------
|
2010-02-16 23:34:57 +01:00
|
|
|
* Async Notification Model as of 9.0:
|
|
|
|
*
|
|
|
|
* 1. Multiple backends on same machine. Multiple backends listening on
|
|
|
|
* several channels. (Channels are also called "conditions" in other
|
|
|
|
* parts of the code.)
|
|
|
|
*
|
|
|
|
* 2. There is one central queue in disk-based storage (directory pg_notify/),
|
|
|
|
* with actively-used pages mapped into shared memory by the slru.c module.
|
|
|
|
* All notification messages are placed in the queue and later read out
|
|
|
|
* by listening backends.
|
|
|
|
*
|
|
|
|
* There is no central knowledge of which backend listens on which channel;
|
|
|
|
* every backend has its own list of interesting channels.
|
|
|
|
*
|
|
|
|
* Although there is only one queue, notifications are treated as being
|
|
|
|
* database-local; this is done by including the sender's database OID
|
|
|
|
* in each notification message. Listening backends ignore messages
|
|
|
|
* that don't match their database OID. This is important because it
|
|
|
|
* ensures senders and receivers have the same database encoding and won't
|
|
|
|
* misinterpret non-ASCII text in the channel name or payload string.
|
|
|
|
*
|
|
|
|
* Since notifications are not expected to survive database crashes,
|
|
|
|
* we can simply clean out the pg_notify data at any reboot, and there
|
|
|
|
* is no need for WAL support or fsync'ing.
|
|
|
|
*
|
|
|
|
* 3. Every backend that is listening on at least one channel registers by
|
|
|
|
* entering its PID into the array in AsyncQueueControl. It then scans all
|
|
|
|
* incoming notifications in the central queue and first compares the
|
|
|
|
* database OID of the notification with its own database OID and then
|
|
|
|
* compares the notified channel with the list of channels that it listens
|
|
|
|
* to. In case there is a match it delivers the notification event to its
|
|
|
|
* frontend. Non-matching events are simply skipped.
|
|
|
|
*
|
|
|
|
* 4. The NOTIFY statement (routine Async_Notify) stores the notification in
|
|
|
|
* a backend-local list which will not be processed until transaction end.
|
|
|
|
*
|
|
|
|
* Duplicate notifications from the same transaction are sent out as one
|
|
|
|
* notification only. This is done to save work when for example a trigger
|
|
|
|
* on a 2 million row table fires a notification for each row that has been
|
|
|
|
* changed. If the application needs to receive every single notification
|
|
|
|
* that has been sent, it can easily add some unique string into the extra
|
|
|
|
* payload parameter.
|
|
|
|
*
|
|
|
|
* When the transaction is ready to commit, PreCommit_Notify() adds the
|
|
|
|
* pending notifications to the head of the queue. The head pointer of the
|
|
|
|
* queue always points to the next free position and a position is just a
|
|
|
|
* page number and the offset in that page. This is done before marking the
|
|
|
|
* transaction as committed in clog. If we run into problems writing the
|
|
|
|
* notifications, we can still call elog(ERROR, ...) and the transaction
|
|
|
|
* will roll back.
|
|
|
|
*
|
|
|
|
* Once we have put all of the notifications into the queue, we return to
|
|
|
|
* CommitTransaction() which will then do the actual transaction commit.
|
|
|
|
*
|
|
|
|
* After commit we are called another time (AtCommit_Notify()). Here we
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
* make any actual updates to the effective listen state (listenChannels).
|
|
|
|
* Then we signal any backends that may be interested in our messages
|
|
|
|
* (including our own backend, if listening). This is done by
|
|
|
|
* SignalBackends(), which scans the list of listening backends and sends a
|
|
|
|
* PROCSIG_NOTIFY_INTERRUPT signal to every listening backend (we don't
|
|
|
|
* know which backend is listening on which channel so we must signal them
|
|
|
|
* all). We can exclude backends that are already up to date, though, and
|
|
|
|
* we can also exclude backends that are in other databases (unless they
|
|
|
|
* are way behind and should be kicked to make them advance their
|
|
|
|
* pointers).
|
|
|
|
*
|
|
|
|
* Finally, after we are out of the transaction altogether and about to go
|
|
|
|
* idle, we scan the queue for messages that need to be sent to our
|
|
|
|
* frontend (which might be notifies from other backends, or self-notifies
|
|
|
|
* from our own). This step is not part of the CommitTransaction sequence
|
|
|
|
* for two important reasons. First, we could get errors while sending
|
|
|
|
* data to our frontend, and it's really bad for errors to happen in
|
|
|
|
* post-commit cleanup. Second, in cases where a procedure issues commits
|
|
|
|
* within a single frontend command, we don't want to send notifies to our
|
|
|
|
* frontend until the command is done; but notifies to other backends
|
|
|
|
* should go out immediately after each commit.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
|
|
|
* 5. Upon receipt of a PROCSIG_NOTIFY_INTERRUPT signal, the signal handler
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
* sets the process's latch, which triggers the event to be processed
|
|
|
|
* immediately if this backend is idle (i.e., it is waiting for a frontend
|
|
|
|
* command and is not within a transaction block. C.f.
|
|
|
|
* ProcessClientReadInterrupt()). Otherwise the handler may only set a
|
|
|
|
* flag, which will cause the processing to occur just before we next go
|
|
|
|
* idle.
|
1998-10-06 04:40:09 +02:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Inbound-notify processing consists of reading all of the notifications
|
|
|
|
* that have arrived since scanning last time. We read every notification
|
|
|
|
* until we reach either a notification from an uncommitted transaction or
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
* the head pointer's position.
|
|
|
|
*
|
|
|
|
* 6. To avoid SLRU wraparound and limit disk space consumption, the tail
|
|
|
|
* pointer needs to be advanced so that old pages can be truncated.
|
|
|
|
* This is relatively expensive (notably, it requires an exclusive lock),
|
|
|
|
* so we don't want to do it often. We make sending backends do this work
|
|
|
|
* if they advanced the queue head into a new page, but only once every
|
|
|
|
* QUEUE_CLEANUP_DELAY pages.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
|
|
|
* An application that listens on the same channel it notifies will get
|
1998-10-06 04:40:09 +02:00
|
|
|
* NOTIFY messages for its own NOTIFYs. These can be ignored, if not useful,
|
|
|
|
* by comparing be_pid in the NOTIFY message to the application's own backend's
|
|
|
|
* PID. (As of FE/BE protocol 2.0, the backend's PID is provided to the
|
|
|
|
* frontend during startup.) The above design guarantees that notifies from
|
2010-02-16 23:34:57 +01:00
|
|
|
* other backends will never be missed by ignoring self-notifies.
|
|
|
|
*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
|
2010-02-16 23:34:57 +01:00
|
|
|
* can be varied without affecting anything but performance. The maximum
|
|
|
|
* amount of notification data that can be queued at one time is determined
|
|
|
|
* by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
|
1998-10-06 04:40:09 +02:00
|
|
|
*-------------------------------------------------------------------------
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
1998-08-30 23:05:27 +02:00
|
|
|
|
2000-10-02 21:42:56 +02:00
|
|
|
#include "postgres.h"
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
#include <limits.h>
|
1996-10-21 11:37:26 +02:00
|
|
|
#include <unistd.h>
|
|
|
|
#include <signal.h>
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2015-10-22 16:49:20 +02:00
|
|
|
#include "access/parallel.h"
|
2010-02-16 23:34:57 +01:00
|
|
|
#include "access/slru.h"
|
|
|
|
#include "access/transam.h"
|
2006-07-13 18:49:20 +02:00
|
|
|
#include "access/xact.h"
|
2010-02-16 23:34:57 +01:00
|
|
|
#include "catalog/pg_database.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "commands/async.h"
|
2020-02-27 04:55:41 +01:00
|
|
|
#include "common/hashfn.h"
|
2010-02-16 23:34:57 +01:00
|
|
|
#include "funcapi.h"
|
1998-04-27 06:08:07 +02:00
|
|
|
#include "libpq/libpq.h"
|
1999-04-25 05:19:27 +02:00
|
|
|
#include "libpq/pqformat.h"
|
1998-04-27 06:08:07 +02:00
|
|
|
#include "miscadmin.h"
|
2002-05-05 02:03:29 +02:00
|
|
|
#include "storage/ipc.h"
|
2010-02-16 23:34:57 +01:00
|
|
|
#include "storage/lmgr.h"
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
#include "storage/proc.h"
|
Avoid transaction-commit race condition while receiving a NOTIFY message.
Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
whether a NOTIFY message's originating transaction is in progress,
committed, or aborted. The previous coding could accept a message from a
transaction that was still in-progress according to the PGPROC array;
if the client were fast enough at starting a new transaction, it might fail
to see table rows added/updated by the message-sending transaction. Which
of course would usually be the point of receiving the message. We noted
this type of race condition long ago in tqual.c, but async.c overlooked it.
The race condition probably cannot occur unless there are multiple NOTIFY
senders in action, since an individual backend doesn't send NOTIFY signals
until well after it's done committing. But if two senders commit in close
succession, it's certainly possible that we could see the second sender's
message within the race condition window while responding to the signal
from the first one.
Per bug #9557 from Marko Tiikkaja. This patch is slightly more invasive
than what he proposed, since it removes the now-redundant
TransactionIdDidAbort call.
Back-patch to 9.0, where the current NOTIFY implementation was introduced.
2014-03-13 17:02:54 +01:00
|
|
|
#include "storage/procarray.h"
|
2009-07-31 22:26:23 +02:00
|
|
|
#include "storage/procsignal.h"
|
2004-05-23 05:50:45 +02:00
|
|
|
#include "storage/sinval.h"
|
2000-11-21 22:16:06 +01:00
|
|
|
#include "tcop/tcopprot.h"
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
#include "utils/builtins.h"
|
2005-05-06 19:24:55 +02:00
|
|
|
#include "utils/memutils.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "utils/ps_status.h"
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
#include "utils/snapmgr.h"
|
2011-09-09 19:23:41 +02:00
|
|
|
#include "utils/timestamp.h"
|
2000-05-31 02:28:42 +02:00
|
|
|
|
1998-08-30 23:05:27 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Maximum size of a NOTIFY payload, including terminating NULL. This
|
|
|
|
* must be kept small enough so that a notification message fits on one
|
2010-02-17 01:52:09 +01:00
|
|
|
* SLRU page. The magic fudge factor here is noncritical as long as it's
|
|
|
|
* more than AsyncQueueEntryEmptySize --- we make it significantly bigger
|
|
|
|
* than that, so changes in that data structure won't affect user-visible
|
|
|
|
* restrictions.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
2010-02-17 01:52:09 +01:00
|
|
|
#define NOTIFY_PAYLOAD_MAX_LENGTH (BLCKSZ - NAMEDATALEN - 128)
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Struct representing an entry in the global notify queue
|
|
|
|
*
|
|
|
|
* This struct declaration has the maximal length, but in a real queue entry
|
|
|
|
* the data area is only big enough for the actual channel and payload strings
|
|
|
|
* (each null-terminated). AsyncQueueEntryEmptySize is the minimum possible
|
|
|
|
* entry size, if both channel and payload strings are empty (but note it
|
|
|
|
* doesn't include alignment padding).
|
|
|
|
*
|
|
|
|
* The "length" field should always be rounded up to the next QUEUEALIGN
|
|
|
|
* multiple so that all fields are properly aligned.
|
|
|
|
*/
|
|
|
|
typedef struct AsyncQueueEntry
|
|
|
|
{
|
|
|
|
int length; /* total allocated length of entry */
|
|
|
|
Oid dboid; /* sender's database OID */
|
|
|
|
TransactionId xid; /* sender's XID */
|
|
|
|
int32 srcPid; /* sender's PID */
|
|
|
|
char data[NAMEDATALEN + NOTIFY_PAYLOAD_MAX_LENGTH];
|
|
|
|
} AsyncQueueEntry;
|
|
|
|
|
|
|
|
/* Currently, no field of AsyncQueueEntry requires more than int alignment */
|
|
|
|
#define QUEUEALIGN(len) INTALIGN(len)
|
|
|
|
|
|
|
|
#define AsyncQueueEntryEmptySize (offsetof(AsyncQueueEntry, data) + 2)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Struct describing a queue position, and assorted macros for working with it
|
|
|
|
*/
|
|
|
|
typedef struct QueuePosition
|
|
|
|
{
|
|
|
|
int page; /* SLRU page number */
|
|
|
|
int offset; /* byte offset within page */
|
|
|
|
} QueuePosition;
|
|
|
|
|
|
|
|
#define QUEUE_POS_PAGE(x) ((x).page)
|
|
|
|
#define QUEUE_POS_OFFSET(x) ((x).offset)
|
|
|
|
|
|
|
|
#define SET_QUEUE_POS(x,y,z) \
|
|
|
|
do { \
|
|
|
|
(x).page = (y); \
|
|
|
|
(x).offset = (z); \
|
|
|
|
} while (0)
|
|
|
|
|
|
|
|
#define QUEUE_POS_EQUAL(x,y) \
|
Fix async.c to not register any SLRU stats counts in the postmaster.
Previously, AsyncShmemInit forcibly initialized the first page of the
async SLRU, to save dealing with that case in asyncQueueAddEntries.
But this is a poor tradeoff, since many installations do not ever use
NOTIFY; for them, expending those cycles in AsyncShmemInit is a
complete waste. Besides, this only saves a couple of instructions
in asyncQueueAddEntries, which hardly seems likely to be measurable.
The real reason to change this now, though, is that now that we track
SLRU access stats, the existing code is causing the postmaster to
accumulate some access counts, which then get inherited into child
processes by fork(), messing up the statistics. Delaying the
initialization into the first child that does a NOTIFY fixes that.
Hence, we can revert f3d23d83e, which was an incorrect attempt at
fixing that issue. Also, add an Assert to pgstat.c that should
catch any future errors of the same sort.
Discussion: https://postgr.es/m/8367.1589391884@sss.pgh.pa.us
2020-05-14 04:48:09 +02:00
|
|
|
((x).page == (y).page && (x).offset == (y).offset)
|
|
|
|
|
|
|
|
#define QUEUE_POS_IS_ZERO(x) \
|
|
|
|
((x).page == 0 && (x).offset == 0)
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
/* choose logically smaller QueuePosition */
|
|
|
|
#define QUEUE_POS_MIN(x,y) \
|
2011-09-28 16:32:38 +02:00
|
|
|
(asyncQueuePagePrecedes((x).page, (y).page) ? (x) : \
|
2010-02-16 23:34:57 +01:00
|
|
|
(x).page != (y).page ? (y) : \
|
|
|
|
(x).offset < (y).offset ? (x) : (y))
|
|
|
|
|
2015-10-01 05:32:22 +02:00
|
|
|
/* choose logically larger QueuePosition */
|
|
|
|
#define QUEUE_POS_MAX(x,y) \
|
|
|
|
(asyncQueuePagePrecedes((x).page, (y).page) ? (y) : \
|
|
|
|
(x).page != (y).page ? (x) : \
|
|
|
|
(x).offset > (y).offset ? (x) : (y))
|
|
|
|
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
/*
|
|
|
|
* Parameter determining how often we try to advance the tail pointer:
|
|
|
|
* we do that after every QUEUE_CLEANUP_DELAY pages of NOTIFY data. This is
|
|
|
|
* also the distance by which a backend in another database needs to be
|
|
|
|
* behind before we'll decide we need to wake it up to advance its pointer.
|
|
|
|
*
|
|
|
|
* Resist the temptation to make this really large. While that would save
|
|
|
|
* work in some places, it would add cost in others. In particular, this
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
* catch up before the pages they'll need to read fall out of SLRU cache.
|
|
|
|
*/
|
|
|
|
#define QUEUE_CLEANUP_DELAY 4
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Struct describing a listening backend's status
|
|
|
|
*/
|
|
|
|
typedef struct QueueBackendStatus
|
|
|
|
{
|
|
|
|
int32 pid; /* either a PID or InvalidPid */
|
2015-10-01 05:32:22 +02:00
|
|
|
Oid dboid; /* backend's database OID, or InvalidOid */
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
BackendId nextListener; /* id of next listener, or InvalidBackendId */
|
2010-02-16 23:34:57 +01:00
|
|
|
QueuePosition pos; /* backend has read queue up to here */
|
|
|
|
} QueueBackendStatus;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Shared memory state for LISTEN/NOTIFY (excluding its SLRU stuff)
|
|
|
|
*
|
2020-08-15 19:15:53 +02:00
|
|
|
* The AsyncQueueControl structure is protected by the NotifyQueueLock and
|
|
|
|
* NotifyQueueTailLock.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
2020-08-15 19:15:53 +02:00
|
|
|
* When holding NotifyQueueLock in SHARED mode, backends may only inspect
|
|
|
|
* their own entries as well as the head and tail pointers. Consequently we
|
|
|
|
* can allow a backend to update its own record while holding only SHARED lock
|
|
|
|
* (since no other backend will inspect it).
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
2020-08-15 19:15:53 +02:00
|
|
|
* When holding NotifyQueueLock in EXCLUSIVE mode, backends can inspect the
|
|
|
|
* entries of other backends and also change the head pointer. When holding
|
|
|
|
* both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
* can change the tail pointers.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
|
2020-08-15 19:15:53 +02:00
|
|
|
* In order to avoid deadlocks, whenever we need multiple locks, we first get
|
|
|
|
* NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
|
|
|
* Each backend uses the backend[] array entry with index equal to its
|
|
|
|
* BackendId (which can range from 1 to MaxBackends). We rely on this to make
|
|
|
|
* SendProcSignal fast.
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
*
|
|
|
|
* The backend[] array entries for actively-listening backends are threaded
|
|
|
|
* together using firstListener and the nextListener links, so that we can
|
|
|
|
* scan them without having to iterate over inactive entries. We keep this
|
|
|
|
* list in order by BackendId so that the scan is cache-friendly when there
|
|
|
|
* are many active entries.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
typedef struct AsyncQueueControl
|
|
|
|
{
|
|
|
|
QueuePosition head; /* head points to the next free location */
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
QueuePosition tail; /* tail must be <= the queue position of every
|
|
|
|
* listening backend */
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
int stopPage; /* oldest unrecycled page; must be <=
|
|
|
|
* tail.page */
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
BackendId firstListener; /* id of first listener, or InvalidBackendId */
|
2010-02-16 23:34:57 +01:00
|
|
|
TimestampTz lastQueueFillWarn; /* time of last queue-full msg */
|
2015-02-21 22:12:14 +01:00
|
|
|
QueueBackendStatus backend[FLEXIBLE_ARRAY_MEMBER];
|
|
|
|
/* backend[0] is not used; used entries are from [1] to [MaxBackends] */
|
2010-02-16 23:34:57 +01:00
|
|
|
} AsyncQueueControl;
|
|
|
|
|
|
|
|
static AsyncQueueControl *asyncQueueControl;
|
|
|
|
|
|
|
|
#define QUEUE_HEAD (asyncQueueControl->head)
|
|
|
|
#define QUEUE_TAIL (asyncQueueControl->tail)
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
#define QUEUE_STOP_PAGE (asyncQueueControl->stopPage)
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
#define QUEUE_FIRST_LISTENER (asyncQueueControl->firstListener)
|
2010-02-16 23:34:57 +01:00
|
|
|
#define QUEUE_BACKEND_PID(i) (asyncQueueControl->backend[i].pid)
|
2015-10-01 05:32:22 +02:00
|
|
|
#define QUEUE_BACKEND_DBOID(i) (asyncQueueControl->backend[i].dboid)
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
#define QUEUE_NEXT_LISTENER(i) (asyncQueueControl->backend[i].nextListener)
|
2010-02-16 23:34:57 +01:00
|
|
|
#define QUEUE_BACKEND_POS(i) (asyncQueueControl->backend[i].pos)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The SLRU buffer area through which we access the notification queue
|
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
static SlruCtlData NotifyCtlData;
|
2010-02-16 23:34:57 +01:00
|
|
|
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
#define NotifyCtl (&NotifyCtlData)
|
2010-02-16 23:34:57 +01:00
|
|
|
#define QUEUE_PAGESIZE BLCKSZ
|
|
|
|
#define QUEUE_FULL_WARN_INTERVAL 5000 /* warn at most once every 5s */
|
|
|
|
|
|
|
|
/*
|
2020-08-16 05:21:52 +02:00
|
|
|
* Use segments 0000 through FFFF. Each contains SLRU_PAGES_PER_SEGMENT pages
|
|
|
|
* which gives us the pages from 0 to SLRU_PAGES_PER_SEGMENT * 0x10000 - 1.
|
|
|
|
* We could use as many segments as SlruScanDirectory() allows, but this gives
|
|
|
|
* us so much space already that it doesn't seem worth the trouble.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
|
|
|
* The most data we can have in the queue at a time is QUEUE_MAX_PAGE/2
|
|
|
|
* pages, because more than that would confuse slru.c into thinking there
|
|
|
|
* was a wraparound condition. With the default BLCKSZ this means there
|
|
|
|
* can be up to 8GB of queued-and-not-read data.
|
|
|
|
*
|
|
|
|
* Note: it's possible to redefine QUEUE_MAX_PAGE with a smaller multiple of
|
|
|
|
* SLRU_PAGES_PER_SEGMENT, for easier testing of queue-full behaviour.
|
|
|
|
*/
|
|
|
|
#define QUEUE_MAX_PAGE (SLRU_PAGES_PER_SEGMENT * 0x10000 - 1)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* listenChannels identifies the channels we are actually listening to
|
|
|
|
* (ie, have committed a LISTEN on). It is a simple list of channel names,
|
|
|
|
* allocated in TopMemoryContext.
|
|
|
|
*/
|
|
|
|
static List *listenChannels = NIL; /* list of C strings */
|
|
|
|
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
/*
|
|
|
|
* State for pending LISTEN/UNLISTEN actions consists of an ordered list of
|
|
|
|
* all actions requested in the current transaction. As explained above,
|
2010-02-16 23:34:57 +01:00
|
|
|
* we don't actually change listenChannels until we reach transaction commit.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*
|
|
|
|
* The list is kept in CurTransactionContext. In subtransactions, each
|
|
|
|
* subtransaction has its own list in its own CurTransactionContext, but
|
|
|
|
* successful subtransactions attach their lists to their parent's list.
|
|
|
|
* Failed subtransactions simply discard their lists.
|
|
|
|
*/
|
|
|
|
typedef enum
|
|
|
|
{
|
|
|
|
LISTEN_LISTEN,
|
|
|
|
LISTEN_UNLISTEN,
|
|
|
|
LISTEN_UNLISTEN_ALL
|
|
|
|
} ListenActionKind;
|
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
ListenActionKind action;
|
2015-02-21 22:12:14 +01:00
|
|
|
char channel[FLEXIBLE_ARRAY_MEMBER]; /* nul-terminated string */
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
} ListenAction;
|
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
typedef struct ActionList
|
|
|
|
{
|
|
|
|
int nestingLevel; /* current transaction nesting depth */
|
Stabilize the results of pg_notification_queue_usage().
This function wasn't touched in commit 51004c717, but that turns out
to be a bad idea, because its results now include any dead space
that exists in the NOTIFY queue on account of our being lazy about
advancing the queue tail. Notably, the isolation tests now fail
if run twice without a server restart between, because async-notify's
first test of the function will already show a positive value.
It seems likely that end users would be equally unhappy about the
result's instability. To fix, just make the function call
asyncQueueAdvanceTail before computing its result. That should end
in producing the same value as before, and it's hard to believe that
there's any practical use-case where pg_notification_queue_usage()
is called so often as to create a performance degradation, especially
compared to what we did before.
Out of paranoia, also mark this function parallel-restricted (it
was volatile, but parallel-safe by default, before). Although the
code seems to work fine when run in a parallel worker, that's outside
the design scope of async.c, and it's a bit scary to have intentional
side-effects happening in a parallel worker. There seems no plausible
use-case where it'd be important to try to parallelize this, so let's
not take any risk of introducing new bugs.
In passing, re-pgindent async.c and run reformat-dat-files on
pg_proc.dat, just because I'm a neatnik.
Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 20:09:33 +01:00
|
|
|
List *actions; /* list of ListenAction structs */
|
2019-10-04 14:19:25 +02:00
|
|
|
struct ActionList *upper; /* details for upper transaction levels */
|
|
|
|
} ActionList;
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
static ActionList *pendingActions = NULL;
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
1998-10-06 04:40:09 +02:00
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* State for outbound notifies consists of a list of all channels+payloads
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
* NOTIFYed in the current transaction. We do not actually perform a NOTIFY
|
|
|
|
* until and unless the transaction commits. pendingNotifies is NULL if no
|
|
|
|
* NOTIFYs have been done in the current (sub) transaction.
|
|
|
|
*
|
|
|
|
* We discard duplicate notify events issued in the same transaction.
|
|
|
|
* Hence, in addition to the list proper (which we need to track the order
|
|
|
|
* of the events, since we guarantee to deliver them in order), we build a
|
|
|
|
* hash table which we can probe to detect duplicates. Since building the
|
|
|
|
* hash table is somewhat expensive, we do so only once we have at least
|
|
|
|
* MIN_HASHABLE_NOTIFIES events queued in the current (sub) transaction;
|
|
|
|
* before that we just scan the events linearly.
|
2004-07-01 02:52:04 +02:00
|
|
|
*
|
|
|
|
* The list is kept in CurTransactionContext. In subtransactions, each
|
|
|
|
* subtransaction has its own list in its own CurTransactionContext, but
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
* successful subtransactions add their entries to their parent's list.
|
|
|
|
* Failed subtransactions simply discard their lists. Since these lists
|
|
|
|
* are independent, there may be notify events in a subtransaction's list
|
|
|
|
* that duplicate events in some ancestor (sub) transaction; we get rid of
|
|
|
|
* the dups when merging the subtransaction's list into its parent's.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*
|
|
|
|
* Note: the action and notify lists do not interact within a transaction.
|
|
|
|
* In particular, if a transaction does NOTIFY and then LISTEN on the same
|
|
|
|
* condition name, it will get a self-notify at commit. This is a bit odd
|
|
|
|
* but is consistent with our historical behavior.
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
2010-02-16 23:34:57 +01:00
|
|
|
typedef struct Notification
|
|
|
|
{
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
uint16 channel_len; /* length of channel-name string */
|
|
|
|
uint16 payload_len; /* length of payload string */
|
|
|
|
/* null-terminated channel name, then null-terminated payload follow */
|
|
|
|
char data[FLEXIBLE_ARRAY_MEMBER];
|
2010-02-16 23:34:57 +01:00
|
|
|
} Notification;
|
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
typedef struct NotificationList
|
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
int nestingLevel; /* current transaction nesting depth */
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
List *events; /* list of Notification structs */
|
|
|
|
HTAB *hashtab; /* hash of NotificationHash structs, or NULL */
|
Stabilize the results of pg_notification_queue_usage().
This function wasn't touched in commit 51004c717, but that turns out
to be a bad idea, because its results now include any dead space
that exists in the NOTIFY queue on account of our being lazy about
advancing the queue tail. Notably, the isolation tests now fail
if run twice without a server restart between, because async-notify's
first test of the function will already show a positive value.
It seems likely that end users would be equally unhappy about the
result's instability. To fix, just make the function call
asyncQueueAdvanceTail before computing its result. That should end
in producing the same value as before, and it's hard to believe that
there's any practical use-case where pg_notification_queue_usage()
is called so often as to create a performance degradation, especially
compared to what we did before.
Out of paranoia, also mark this function parallel-restricted (it
was volatile, but parallel-safe by default, before). Although the
code seems to work fine when run in a parallel worker, that's outside
the design scope of async.c, and it's a bit scary to have intentional
side-effects happening in a parallel worker. There seems no plausible
use-case where it'd be important to try to parallelize this, so let's
not take any risk of introducing new bugs.
In passing, re-pgindent async.c and run reformat-dat-files on
pg_proc.dat, just because I'm a neatnik.
Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 20:09:33 +01:00
|
|
|
struct NotificationList *upper; /* details for upper transaction levels */
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
} NotificationList;
|
|
|
|
|
|
|
|
#define MIN_HASHABLE_NOTIFIES 16 /* threshold to build hashtab */
|
|
|
|
|
|
|
|
typedef struct NotificationHash
|
|
|
|
{
|
|
|
|
Notification *event; /* => the actual Notification struct */
|
|
|
|
} NotificationHash;
|
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
static NotificationList *pendingNotifies = NULL;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
* Inbound notifications are initially processed by HandleNotifyInterrupt(),
|
|
|
|
* called from inside a signal handler. That just sets the
|
|
|
|
* notifyInterruptPending flag and sets the process
|
|
|
|
* latch. ProcessNotifyInterrupt() will then be called whenever it's safe to
|
|
|
|
* actually deal with the interrupt.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
volatile sig_atomic_t notifyInterruptPending = false;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2000-11-29 00:27:57 +01:00
|
|
|
/* True if we've registered an on_shmem_exit cleanup */
|
|
|
|
static bool unlistenExitRegistered = false;
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2013-02-13 18:48:05 +01:00
|
|
|
/* True if we're currently registered as a listener in asyncQueueControl */
|
|
|
|
static bool amRegisteredListener = false;
|
|
|
|
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
/* have we advanced to a page that's a multiple of QUEUE_CLEANUP_DELAY? */
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
static bool tryAdvanceTail = false;
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* GUC parameter */
|
2001-06-18 00:27:15 +02:00
|
|
|
bool Trace_notify = false;
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* local function prototypes */
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
static int asyncQueuePageDiff(int p, int q);
|
2011-09-28 16:32:38 +02:00
|
|
|
static bool asyncQueuePagePrecedes(int p, int q);
|
2010-02-16 23:34:57 +01:00
|
|
|
static void queue_listen(ListenActionKind action, const char *channel);
|
2003-12-12 19:45:10 +01:00
|
|
|
static void Async_UnlistenOnExit(int code, Datum arg);
|
2010-02-16 23:34:57 +01:00
|
|
|
static void Exec_ListenPreCommit(void);
|
|
|
|
static void Exec_ListenCommit(const char *channel);
|
|
|
|
static void Exec_UnlistenCommit(const char *channel);
|
|
|
|
static void Exec_UnlistenAllCommit(void);
|
|
|
|
static bool IsListeningOn(const char *channel);
|
|
|
|
static void asyncQueueUnregister(void);
|
|
|
|
static bool asyncQueueIsFull(void);
|
2015-01-26 17:57:33 +01:00
|
|
|
static bool asyncQueueAdvance(volatile QueuePosition *position, int entryLength);
|
2010-02-16 23:34:57 +01:00
|
|
|
static void asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe);
|
|
|
|
static ListCell *asyncQueueAddEntries(ListCell *nextNotify);
|
2015-07-17 15:12:03 +02:00
|
|
|
static double asyncQueueUsage(void);
|
2010-02-16 23:34:57 +01:00
|
|
|
static void asyncQueueFillWarning(void);
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
static void SignalBackends(void);
|
2010-02-16 23:34:57 +01:00
|
|
|
static void asyncQueueReadAllNotifications(void);
|
2015-01-26 17:57:33 +01:00
|
|
|
static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
|
2010-02-16 23:34:57 +01:00
|
|
|
QueuePosition stop,
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
char *page_buffer,
|
|
|
|
Snapshot snapshot);
|
2010-02-16 23:34:57 +01:00
|
|
|
static void asyncQueueAdvanceTail(void);
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
static void ProcessIncomingNotify(bool flush);
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
static bool AsyncExistsPendingNotify(Notification *n);
|
|
|
|
static void AddEventToPendingNotifies(Notification *n);
|
|
|
|
static uint32 notification_hash(const void *key, Size keysize);
|
|
|
|
static int notification_match(const void *key1, const void *key2, Size keysize);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
static void ClearPendingActionsAndNotifies(void);
|
1998-08-30 23:05:27 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
* Compute the difference between two queue page numbers (i.e., p - q),
|
|
|
|
* accounting for wraparound.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
static int
|
|
|
|
asyncQueuePageDiff(int p, int q)
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
|
|
|
int diff;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We have to compare modulo (QUEUE_MAX_PAGE+1)/2. Both inputs should be
|
|
|
|
* in the range 0..QUEUE_MAX_PAGE.
|
|
|
|
*/
|
|
|
|
Assert(p >= 0 && p <= QUEUE_MAX_PAGE);
|
|
|
|
Assert(q >= 0 && q <= QUEUE_MAX_PAGE);
|
|
|
|
|
|
|
|
diff = p - q;
|
|
|
|
if (diff >= ((QUEUE_MAX_PAGE + 1) / 2))
|
|
|
|
diff -= QUEUE_MAX_PAGE + 1;
|
|
|
|
else if (diff < -((QUEUE_MAX_PAGE + 1) / 2))
|
|
|
|
diff += QUEUE_MAX_PAGE + 1;
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
return diff;
|
|
|
|
}
|
|
|
|
|
2021-01-16 21:21:35 +01:00
|
|
|
/*
|
|
|
|
* Is p < q, accounting for wraparound?
|
|
|
|
*
|
|
|
|
* Since asyncQueueIsFull() blocks creation of a page that could precede any
|
|
|
|
* extant page, we need not assess entries within a page.
|
|
|
|
*/
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
static bool
|
|
|
|
asyncQueuePagePrecedes(int p, int q)
|
|
|
|
{
|
|
|
|
return asyncQueuePageDiff(p, q) < 0;
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Report space needed for our shared memory area
|
|
|
|
*/
|
|
|
|
Size
|
|
|
|
AsyncShmemSize(void)
|
|
|
|
{
|
|
|
|
Size size;
|
|
|
|
|
|
|
|
/* This had better match AsyncShmemInit */
|
2015-02-21 22:12:14 +01:00
|
|
|
size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
|
|
|
|
size = add_size(size, offsetof(AsyncQueueControl, backend));
|
2010-02-16 23:34:57 +01:00
|
|
|
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize our shared memory area
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
AsyncShmemInit(void)
|
|
|
|
{
|
|
|
|
bool found;
|
|
|
|
Size size;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create or attach to the AsyncQueueControl structure.
|
|
|
|
*
|
2015-02-21 22:12:14 +01:00
|
|
|
* The used entries in the backend[] array run from 1 to MaxBackends; the
|
|
|
|
* zero'th entry is unused but must be allocated.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
2015-02-21 22:12:14 +01:00
|
|
|
size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
|
|
|
|
size = add_size(size, offsetof(AsyncQueueControl, backend));
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
asyncQueueControl = (AsyncQueueControl *)
|
|
|
|
ShmemInitStruct("Async Queue Control", size, &found);
|
|
|
|
|
|
|
|
if (!found)
|
|
|
|
{
|
|
|
|
/* First time through, so initialize it */
|
|
|
|
SET_QUEUE_POS(QUEUE_HEAD, 0, 0);
|
|
|
|
SET_QUEUE_POS(QUEUE_TAIL, 0, 0);
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
QUEUE_STOP_PAGE = 0;
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
QUEUE_FIRST_LISTENER = InvalidBackendId;
|
2010-02-16 23:34:57 +01:00
|
|
|
asyncQueueControl->lastQueueFillWarn = 0;
|
|
|
|
/* zero'th entry won't be used, but let's initialize it anyway */
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
for (int i = 0; i <= MaxBackends; i++)
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
|
|
|
QUEUE_BACKEND_PID(i) = InvalidPid;
|
2015-10-01 05:32:22 +02:00
|
|
|
QUEUE_BACKEND_DBOID(i) = InvalidOid;
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
QUEUE_NEXT_LISTENER(i) = InvalidBackendId;
|
2010-02-16 23:34:57 +01:00
|
|
|
SET_QUEUE_POS(QUEUE_BACKEND_POS(i), 0, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set up SLRU management of the pg_notify data.
|
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
|
|
|
|
SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
|
Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery. Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba. This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool. Rearrange things so that data collected in CheckpointStats
includes SLRU activity.
Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them. (I'm not sure if they were ever needed, but
they aren't now.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 08:49:43 +02:00
|
|
|
NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
|
|
|
|
SYNC_HANDLER_NONE);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
if (!found)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* During start or reboot, clean out the pg_notify directory.
|
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
(void) SlruScanDirectory(NotifyCtl, SlruScanDirCbDeleteAll, NULL);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_notify -
|
|
|
|
* SQL function to send a notification event
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
pg_notify(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
const char *channel;
|
|
|
|
const char *payload;
|
|
|
|
|
|
|
|
if (PG_ARGISNULL(0))
|
|
|
|
channel = "";
|
|
|
|
else
|
|
|
|
channel = text_to_cstring(PG_GETARG_TEXT_PP(0));
|
|
|
|
|
|
|
|
if (PG_ARGISNULL(1))
|
|
|
|
payload = "";
|
|
|
|
else
|
|
|
|
payload = text_to_cstring(PG_GETARG_TEXT_PP(1));
|
|
|
|
|
2010-02-20 22:24:02 +01:00
|
|
|
/* For NOTIFY as a statement, this is checked in ProcessUtility */
|
|
|
|
PreventCommandDuringRecovery("NOTIFY");
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
Async_Notify(channel, payload);
|
|
|
|
|
|
|
|
PG_RETURN_VOID();
|
|
|
|
}
|
|
|
|
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* Async_Notify
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1998-08-30 23:05:27 +02:00
|
|
|
* This is executed by the SQL notify command.
|
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Adds the message to the list of pending notifies.
|
1998-10-06 04:40:09 +02:00
|
|
|
* Actual notification happens during transaction commit.
|
|
|
|
* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
|
|
|
void
|
2010-02-16 23:34:57 +01:00
|
|
|
Async_Notify(const char *channel, const char *payload)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
int my_level = GetCurrentTransactionNestLevel();
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
size_t channel_len;
|
|
|
|
size_t payload_len;
|
2010-02-16 23:34:57 +01:00
|
|
|
Notification *n;
|
|
|
|
MemoryContext oldcontext;
|
|
|
|
|
2015-10-22 16:49:20 +02:00
|
|
|
if (IsParallelWorker())
|
|
|
|
elog(ERROR, "cannot send notifications from a parallel worker");
|
2015-10-16 17:48:48 +02:00
|
|
|
|
2000-05-31 02:28:42 +02:00
|
|
|
if (Trace_notify)
|
2010-02-16 23:34:57 +01:00
|
|
|
elog(DEBUG1, "Async_Notify(%s)", channel);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
channel_len = channel ? strlen(channel) : 0;
|
|
|
|
payload_len = payload ? strlen(payload) : 0;
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* a channel name must be specified */
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
if (channel_len == 0)
|
2010-02-16 23:34:57 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("channel name cannot be empty")));
|
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
/* enforce length limits */
|
|
|
|
if (channel_len >= NAMEDATALEN)
|
2010-02-16 23:34:57 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("channel name too long")));
|
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
if (payload_len >= NOTIFY_PAYLOAD_MAX_LENGTH)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("payload string too long")));
|
2001-06-18 00:27:15 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
* We must construct the Notification entry, even if we end up not using
|
|
|
|
* it, in order to compare it cheaply to existing list entries.
|
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* The notification list needs to live until end of transaction, so store
|
|
|
|
* it in the transaction context.
|
|
|
|
*/
|
|
|
|
oldcontext = MemoryContextSwitchTo(CurTransactionContext);
|
2001-06-18 00:27:15 +02:00
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
n = (Notification *) palloc(offsetof(Notification, data) +
|
|
|
|
channel_len + payload_len + 2);
|
|
|
|
n->channel_len = channel_len;
|
|
|
|
n->payload_len = payload_len;
|
|
|
|
strcpy(n->data, channel);
|
2010-02-16 23:34:57 +01:00
|
|
|
if (payload)
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
strcpy(n->data + channel_len + 1, payload);
|
2010-02-16 23:34:57 +01:00
|
|
|
else
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
n->data[channel_len + 1] = '\0';
|
2010-02-16 23:34:57 +01:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
if (pendingNotifies == NULL || my_level > pendingNotifies->nestingLevel)
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
NotificationList *notifies;
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
/*
|
|
|
|
* First notify event in current (sub)xact. Note that we allocate the
|
|
|
|
* NotificationList in TopTransactionContext; the nestingLevel might
|
|
|
|
* get changed later by AtSubCommit_Notify.
|
|
|
|
*/
|
|
|
|
notifies = (NotificationList *)
|
|
|
|
MemoryContextAlloc(TopTransactionContext,
|
|
|
|
sizeof(NotificationList));
|
|
|
|
notifies->nestingLevel = my_level;
|
|
|
|
notifies->events = list_make1(n);
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
/* We certainly don't need a hashtable yet */
|
2019-10-04 14:19:25 +02:00
|
|
|
notifies->hashtab = NULL;
|
|
|
|
notifies->upper = pendingNotifies;
|
|
|
|
pendingNotifies = notifies;
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
/* Now check for duplicates */
|
|
|
|
if (AsyncExistsPendingNotify(n))
|
|
|
|
{
|
|
|
|
/* It's a dup, so forget it */
|
|
|
|
pfree(n);
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
/* Append more events to existing list */
|
|
|
|
AddEventToPendingNotifies(n);
|
|
|
|
}
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
* queue_listen
|
|
|
|
* Common code for listen, unlisten, unlisten all commands.
|
|
|
|
*
|
|
|
|
* Adds the request to the list of pending actions.
|
2010-02-16 23:34:57 +01:00
|
|
|
* Actual update of the listenChannels list happens during transaction
|
|
|
|
* commit.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*/
|
|
|
|
static void
|
2010-02-16 23:34:57 +01:00
|
|
|
queue_listen(ListenActionKind action, const char *channel)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
|
|
|
MemoryContext oldcontext;
|
|
|
|
ListenAction *actrec;
|
2019-10-04 14:19:25 +02:00
|
|
|
int my_level = GetCurrentTransactionNestLevel();
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Unlike Async_Notify, we don't try to collapse out duplicates. It would
|
|
|
|
* be too complicated to ensure we get the right interactions of
|
|
|
|
* conflicting LISTEN/UNLISTEN/UNLISTEN_ALL, and it's unlikely that there
|
|
|
|
* would be any performance benefit anyway in sane applications.
|
|
|
|
*/
|
|
|
|
oldcontext = MemoryContextSwitchTo(CurTransactionContext);
|
|
|
|
|
|
|
|
/* space for terminating null is included in sizeof(ListenAction) */
|
2015-02-21 22:12:14 +01:00
|
|
|
actrec = (ListenAction *) palloc(offsetof(ListenAction, channel) +
|
|
|
|
strlen(channel) + 1);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
actrec->action = action;
|
2010-02-16 23:34:57 +01:00
|
|
|
strcpy(actrec->channel, channel);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
if (pendingActions == NULL || my_level > pendingActions->nestingLevel)
|
|
|
|
{
|
|
|
|
ActionList *actions;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First action in current sub(xact). Note that we allocate the
|
|
|
|
* ActionList in TopTransactionContext; the nestingLevel might get
|
|
|
|
* changed later by AtSubCommit_Notify.
|
|
|
|
*/
|
|
|
|
actions = (ActionList *)
|
|
|
|
MemoryContextAlloc(TopTransactionContext, sizeof(ActionList));
|
|
|
|
actions->nestingLevel = my_level;
|
|
|
|
actions->actions = list_make1(actrec);
|
|
|
|
actions->upper = pendingActions;
|
|
|
|
pendingActions = actions;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
pendingActions->actions = lappend(pendingActions->actions, actrec);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* Async_Listen
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
1998-08-30 23:05:27 +02:00
|
|
|
* This is executed by the SQL listen command.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*/
|
|
|
|
void
|
2010-02-16 23:34:57 +01:00
|
|
|
Async_Listen(const char *channel)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
|
|
|
if (Trace_notify)
|
2010-02-16 23:34:57 +01:00
|
|
|
elog(DEBUG1, "Async_Listen(%s,%d)", channel, MyProcPid);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
queue_listen(LISTEN_LISTEN, channel);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Async_Unlisten
|
|
|
|
*
|
|
|
|
* This is executed by the SQL unlisten command.
|
|
|
|
*/
|
|
|
|
void
|
2010-02-16 23:34:57 +01:00
|
|
|
Async_Unlisten(const char *channel)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
2008-08-30 03:39:14 +02:00
|
|
|
if (Trace_notify)
|
2010-02-16 23:34:57 +01:00
|
|
|
elog(DEBUG1, "Async_Unlisten(%s,%d)", channel, MyProcPid);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2009-02-13 18:12:04 +01:00
|
|
|
/* If we couldn't possibly be listening, no need to queue anything */
|
2019-10-04 14:19:25 +02:00
|
|
|
if (pendingActions == NULL && !unlistenExitRegistered)
|
2009-02-13 18:12:04 +01:00
|
|
|
return;
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
queue_listen(LISTEN_UNLISTEN, channel);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Async_UnlistenAll
|
|
|
|
*
|
2008-08-30 03:39:14 +02:00
|
|
|
* This is invoked by UNLISTEN * command, and also at backend exit.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
Async_UnlistenAll(void)
|
|
|
|
{
|
|
|
|
if (Trace_notify)
|
|
|
|
elog(DEBUG1, "Async_UnlistenAll(%d)", MyProcPid);
|
|
|
|
|
2009-02-13 18:12:04 +01:00
|
|
|
/* If we couldn't possibly be listening, no need to queue anything */
|
2019-10-04 14:19:25 +02:00
|
|
|
if (pendingActions == NULL && !unlistenExitRegistered)
|
2009-02-13 18:12:04 +01:00
|
|
|
return;
|
|
|
|
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
queue_listen(LISTEN_UNLISTEN_ALL, "");
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* SQL function: return a set of the channel names this backend is actively
|
|
|
|
* listening to.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Note: this coding relies on the fact that the listenChannels list cannot
|
|
|
|
* change within a transaction.
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
pg_listening_channels(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
FuncCallContext *funcctx;
|
|
|
|
|
|
|
|
/* stuff done only on the first call of the function */
|
|
|
|
if (SRF_IS_FIRSTCALL())
|
|
|
|
{
|
|
|
|
/* create a function context for cross-call persistence */
|
|
|
|
funcctx = SRF_FIRSTCALL_INIT();
|
|
|
|
}
|
|
|
|
|
|
|
|
/* stuff done on every call of the function */
|
|
|
|
funcctx = SRF_PERCALL_SETUP();
|
|
|
|
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
if (funcctx->call_cntr < list_length(listenChannels))
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
char *channel = (char *) list_nth(listenChannels,
|
|
|
|
funcctx->call_cntr);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
SRF_RETURN_NEXT(funcctx, CStringGetTextDatum(channel));
|
|
|
|
}
|
|
|
|
|
|
|
|
SRF_RETURN_DONE(funcctx);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Async_UnlistenOnExit
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* This is executed at backend exit if we have done any LISTENs in this
|
|
|
|
* backend. It might not be necessary anymore, if the user UNLISTENed
|
|
|
|
* everything, but we don't try to detect that case.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
Async_UnlistenOnExit(int code, Datum arg)
|
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
Exec_UnlistenAllCommit();
|
2013-02-13 18:48:05 +01:00
|
|
|
asyncQueueUnregister();
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* AtPrepare_Notify
|
1998-08-30 23:05:27 +02:00
|
|
|
*
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
* This is called at the prepare phase of a two-phase
|
|
|
|
* transaction. Save the state for possible commit later.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
AtPrepare_Notify(void)
|
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
/* It's not allowed to have any pending LISTEN/UNLISTEN/NOTIFY actions */
|
|
|
|
if (pendingActions || pendingNotifies)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2012-02-24 10:04:45 +01:00
|
|
|
errmsg("cannot PREPARE a transaction that has executed LISTEN, UNLISTEN, or NOTIFY")));
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* PreCommit_Notify
|
|
|
|
*
|
|
|
|
* This is called at transaction commit, before actually committing to
|
|
|
|
* clog.
|
|
|
|
*
|
|
|
|
* If there are pending LISTEN actions, make sure we are listed in the
|
|
|
|
* shared-memory listener array. This must happen before commit to
|
|
|
|
* ensure we don't miss any notifies from transactions that commit
|
|
|
|
* just after ours.
|
|
|
|
*
|
|
|
|
* If there are outbound notify requests in the pendingNotifies list,
|
|
|
|
* add them to the global queue. We do that before commit so that
|
|
|
|
* we can still throw error if we run out of queue space.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
PreCommit_Notify(void)
|
|
|
|
{
|
|
|
|
ListCell *p;
|
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
if (!pendingActions && !pendingNotifies)
|
2010-02-16 23:34:57 +01:00
|
|
|
return; /* no relevant statements in this xact */
|
|
|
|
|
|
|
|
if (Trace_notify)
|
|
|
|
elog(DEBUG1, "PreCommit_Notify");
|
|
|
|
|
|
|
|
/* Preflight for any pending listen/unlisten actions */
|
2019-10-04 14:19:25 +02:00
|
|
|
if (pendingActions != NULL)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
foreach(p, pendingActions->actions)
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
ListenAction *actrec = (ListenAction *) lfirst(p);
|
|
|
|
|
|
|
|
switch (actrec->action)
|
|
|
|
{
|
|
|
|
case LISTEN_LISTEN:
|
|
|
|
Exec_ListenPreCommit();
|
|
|
|
break;
|
|
|
|
case LISTEN_UNLISTEN:
|
|
|
|
/* there is no Exec_UnlistenPreCommit() */
|
|
|
|
break;
|
|
|
|
case LISTEN_UNLISTEN_ALL:
|
|
|
|
/* there is no Exec_UnlistenAllPreCommit() */
|
|
|
|
break;
|
|
|
|
}
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
}
|
|
|
|
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
/* Queue any pending notifies (must happen after the above) */
|
2010-02-16 23:34:57 +01:00
|
|
|
if (pendingNotifies)
|
|
|
|
{
|
|
|
|
ListCell *nextNotify;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that we have an XID assigned to the current transaction.
|
|
|
|
* GetCurrentTransactionId is cheap if we already have an XID, but not
|
|
|
|
* so cheap if we don't, and we'd prefer not to do that work while
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* holding NotifyQueueLock.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
(void) GetCurrentTransactionId();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Serialize writers by acquiring a special lock that we hold till
|
|
|
|
* after commit. This ensures that queue entries appear in commit
|
|
|
|
* order, and in particular that there are never uncommitted queue
|
|
|
|
* entries ahead of committed ones, so an uncommitted transaction
|
|
|
|
* can't block delivery of deliverable notifications.
|
|
|
|
*
|
|
|
|
* We use a heavyweight lock so that it'll automatically be released
|
|
|
|
* after either commit or abort. This also allows deadlocks to be
|
|
|
|
* detected, though really a deadlock shouldn't be possible here.
|
|
|
|
*
|
|
|
|
* The lock is on "database 0", which is pretty ugly but it doesn't
|
|
|
|
* seem worth inventing a special locktag category just for this.
|
|
|
|
* (Historical note: before PG 9.0, a similar lock on "database 0" was
|
|
|
|
* used by the flatfiles mechanism.)
|
|
|
|
*/
|
|
|
|
LockSharedObject(DatabaseRelationId, InvalidOid, 0,
|
|
|
|
AccessExclusiveLock);
|
|
|
|
|
|
|
|
/* Now push the notifications into the queue */
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
nextNotify = list_head(pendingNotifies->events);
|
2010-02-16 23:34:57 +01:00
|
|
|
while (nextNotify != NULL)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Add the pending notifications to the queue. We acquire and
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* release NotifyQueueLock once per page, which might be overkill
|
2010-02-16 23:34:57 +01:00
|
|
|
* but it does allow readers to get in while we're doing this.
|
|
|
|
*
|
|
|
|
* A full queue is very uncommon and should really not happen,
|
|
|
|
* given that we have so much space available in the SLRU pages.
|
|
|
|
* Nevertheless we need to deal with this possibility. Note that
|
|
|
|
* when we get here we are in the process of committing our
|
|
|
|
* transaction, but we have not yet committed to clog, so at this
|
|
|
|
* point in time we can still roll the transaction back.
|
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
|
2010-02-16 23:34:57 +01:00
|
|
|
asyncQueueFillWarning();
|
|
|
|
if (asyncQueueIsFull())
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
|
|
|
|
errmsg("too many notifications in the NOTIFY queue")));
|
|
|
|
nextNotify = asyncQueueAddEntries(nextNotify);
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
|
|
|
|
/* Note that we don't clear pendingNotifies; AtCommit_Notify will. */
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* AtCommit_Notify
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* This is called at transaction commit, after committing to clog.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Update listenChannels and clear transaction-local state.
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
*
|
|
|
|
* If we issued any notifications in the transaction, send signals to
|
|
|
|
* listening backends (possibly including ourselves) to process them.
|
|
|
|
* Also, if we filled enough queue pages with new notifies, try to
|
|
|
|
* advance the queue tail pointer.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
|
|
|
void
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
AtCommit_Notify(void)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
ListCell *p;
|
|
|
|
|
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Allow transactions that have not executed LISTEN/UNLISTEN/NOTIFY to
|
|
|
|
* return as soon as possible
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*/
|
2010-02-16 23:34:57 +01:00
|
|
|
if (!pendingActions && !pendingNotifies)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
return;
|
|
|
|
|
|
|
|
if (Trace_notify)
|
|
|
|
elog(DEBUG1, "AtCommit_Notify");
|
|
|
|
|
|
|
|
/* Perform any pending listen/unlisten actions */
|
2019-10-04 14:19:25 +02:00
|
|
|
if (pendingActions != NULL)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
foreach(p, pendingActions->actions)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
ListenAction *actrec = (ListenAction *) lfirst(p);
|
|
|
|
|
|
|
|
switch (actrec->action)
|
|
|
|
{
|
|
|
|
case LISTEN_LISTEN:
|
|
|
|
Exec_ListenCommit(actrec->channel);
|
|
|
|
break;
|
|
|
|
case LISTEN_UNLISTEN:
|
|
|
|
Exec_UnlistenCommit(actrec->channel);
|
|
|
|
break;
|
|
|
|
case LISTEN_UNLISTEN_ALL:
|
|
|
|
Exec_UnlistenAllCommit();
|
|
|
|
break;
|
|
|
|
}
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
}
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
|
2013-02-13 18:48:05 +01:00
|
|
|
/* If no longer listening to anything, get out of listener array */
|
|
|
|
if (amRegisteredListener && listenChannels == NIL)
|
|
|
|
asyncQueueUnregister();
|
2010-02-16 23:34:57 +01:00
|
|
|
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
/*
|
|
|
|
* Send signals to listening backends. We need do this only if there are
|
|
|
|
* pending notifies, which were previously added to the shared queue by
|
|
|
|
* PreCommit_Notify().
|
|
|
|
*/
|
|
|
|
if (pendingNotifies != NULL)
|
|
|
|
SignalBackends();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If it's time to try to advance the global tail pointer, do that.
|
|
|
|
*
|
|
|
|
* (It might seem odd to do this in the sender, when more than likely the
|
|
|
|
* listeners won't yet have read the messages we just sent. However,
|
|
|
|
* there's less contention if only the sender does it, and there is little
|
|
|
|
* need for urgency in advancing the global tail. So this typically will
|
|
|
|
* be clearing out messages that were sent some time ago.)
|
|
|
|
*/
|
|
|
|
if (tryAdvanceTail)
|
|
|
|
{
|
|
|
|
tryAdvanceTail = false;
|
|
|
|
asyncQueueAdvanceTail();
|
|
|
|
}
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* And clean up */
|
|
|
|
ClearPendingActionsAndNotifies();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exec_ListenPreCommit --- subroutine for PreCommit_Notify
|
|
|
|
*
|
|
|
|
* This function must make sure we are ready to catch any incoming messages.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
Exec_ListenPreCommit(void)
|
|
|
|
{
|
2015-10-01 05:32:22 +02:00
|
|
|
QueuePosition head;
|
|
|
|
QueuePosition max;
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
BackendId prevListener;
|
2015-10-01 05:32:22 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Nothing to do if we are already listening to something, nor if we
|
|
|
|
* already ran this routine in this transaction.
|
|
|
|
*/
|
2013-02-13 18:48:05 +01:00
|
|
|
if (amRegisteredListener)
|
2010-02-16 23:34:57 +01:00
|
|
|
return;
|
|
|
|
|
|
|
|
if (Trace_notify)
|
|
|
|
elog(DEBUG1, "Exec_ListenPreCommit(%d)", MyProcPid);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Before registering, make sure we will unlisten before dying. (Note:
|
|
|
|
* this action does not get undone if we abort later.)
|
|
|
|
*/
|
|
|
|
if (!unlistenExitRegistered)
|
|
|
|
{
|
2013-12-18 18:57:20 +01:00
|
|
|
before_shmem_exit(Async_UnlistenOnExit, 0);
|
2010-02-16 23:34:57 +01:00
|
|
|
unlistenExitRegistered = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is our first LISTEN, so establish our pointer.
|
|
|
|
*
|
|
|
|
* We set our pointer to the global tail pointer and then move it forward
|
|
|
|
* over already-committed notifications. This ensures we cannot miss any
|
|
|
|
* not-yet-committed notifications. We might get a few more but that
|
|
|
|
* doesn't hurt.
|
2015-10-01 05:32:22 +02:00
|
|
|
*
|
|
|
|
* In some scenarios there might be a lot of committed notifications that
|
|
|
|
* have not yet been pruned away (because some backend is being lazy about
|
|
|
|
* reading them). To reduce our startup time, we can look at other
|
|
|
|
* backends and adopt the maximum "pos" pointer of any backend that's in
|
|
|
|
* our database; any notifications it's already advanced over are surely
|
|
|
|
* committed and need not be re-examined by us. (We must consider only
|
|
|
|
* backends connected to our DB, because others will not have bothered to
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
* check committed-ness of notifications in our DB.)
|
2015-10-01 05:32:22 +02:00
|
|
|
*
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
* We need exclusive lock here so we can look at other backends' entries
|
|
|
|
* and manipulate the list links.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
|
2015-10-01 05:32:22 +02:00
|
|
|
head = QUEUE_HEAD;
|
|
|
|
max = QUEUE_TAIL;
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
prevListener = InvalidBackendId;
|
|
|
|
for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i))
|
2015-10-01 05:32:22 +02:00
|
|
|
{
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
if (QUEUE_BACKEND_DBOID(i) == MyDatabaseId)
|
|
|
|
max = QUEUE_POS_MAX(max, QUEUE_BACKEND_POS(i));
|
|
|
|
/* Also find last listening backend before this one */
|
|
|
|
if (i < MyBackendId)
|
|
|
|
prevListener = i;
|
2015-10-01 05:32:22 +02:00
|
|
|
}
|
|
|
|
QUEUE_BACKEND_POS(MyBackendId) = max;
|
2010-02-16 23:34:57 +01:00
|
|
|
QUEUE_BACKEND_PID(MyBackendId) = MyProcPid;
|
2015-10-01 05:32:22 +02:00
|
|
|
QUEUE_BACKEND_DBOID(MyBackendId) = MyDatabaseId;
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
/* Insert backend into list of listeners at correct position */
|
|
|
|
if (prevListener > 0)
|
|
|
|
{
|
|
|
|
QUEUE_NEXT_LISTENER(MyBackendId) = QUEUE_NEXT_LISTENER(prevListener);
|
|
|
|
QUEUE_NEXT_LISTENER(prevListener) = MyBackendId;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
QUEUE_NEXT_LISTENER(MyBackendId) = QUEUE_FIRST_LISTENER;
|
|
|
|
QUEUE_FIRST_LISTENER = MyBackendId;
|
|
|
|
}
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
2013-02-13 18:48:05 +01:00
|
|
|
/* Now we are listed in the global array, so remember we're listening */
|
|
|
|
amRegisteredListener = true;
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
2019-11-25 00:03:39 +01:00
|
|
|
* Try to move our pointer forward as far as possible. This will skip
|
|
|
|
* over already-committed notifications, which we want to do because they
|
|
|
|
* might be quite stale. Note that we are not yet listening on anything,
|
|
|
|
* so we won't deliver such notifications to our frontend. Also, although
|
|
|
|
* our transaction might have executed NOTIFY, those message(s) aren't
|
|
|
|
* queued yet so we won't skip them here.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
2015-10-01 05:32:22 +02:00
|
|
|
if (!QUEUE_POS_EQUAL(max, head))
|
|
|
|
asyncQueueReadAllNotifications();
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exec_ListenCommit --- subroutine for AtCommit_Notify
|
|
|
|
*
|
|
|
|
* Add the channel to the list of channels we are listening on.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
Exec_ListenCommit(const char *channel)
|
|
|
|
{
|
|
|
|
MemoryContext oldcontext;
|
|
|
|
|
|
|
|
/* Do nothing if we are already listening on this channel */
|
|
|
|
if (IsListeningOn(channel))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add the new channel name to listenChannels.
|
|
|
|
*
|
|
|
|
* XXX It is theoretically possible to get an out-of-memory failure here,
|
|
|
|
* which would be bad because we already committed. For the moment it
|
|
|
|
* doesn't seem worth trying to guard against that, but maybe improve this
|
|
|
|
* later.
|
|
|
|
*/
|
|
|
|
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
|
|
|
|
listenChannels = lappend(listenChannels, pstrdup(channel));
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exec_UnlistenCommit --- subroutine for AtCommit_Notify
|
|
|
|
*
|
|
|
|
* Remove the specified channel name from listenChannels.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
Exec_UnlistenCommit(const char *channel)
|
|
|
|
{
|
|
|
|
ListCell *q;
|
|
|
|
|
|
|
|
if (Trace_notify)
|
|
|
|
elog(DEBUG1, "Exec_UnlistenCommit(%s,%d)", channel, MyProcPid);
|
|
|
|
|
|
|
|
foreach(q, listenChannels)
|
|
|
|
{
|
|
|
|
char *lchan = (char *) lfirst(q);
|
|
|
|
|
|
|
|
if (strcmp(lchan, channel) == 0)
|
|
|
|
{
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
listenChannels = foreach_delete_current(listenChannels, q);
|
2010-02-16 23:34:57 +01:00
|
|
|
pfree(lchan);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* We do not complain about unlistening something not being listened;
|
|
|
|
* should we?
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exec_UnlistenAllCommit --- subroutine for AtCommit_Notify
|
|
|
|
*
|
|
|
|
* Unlisten on all channels for this backend.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
Exec_UnlistenAllCommit(void)
|
|
|
|
{
|
|
|
|
if (Trace_notify)
|
|
|
|
elog(DEBUG1, "Exec_UnlistenAllCommit(%d)", MyProcPid);
|
|
|
|
|
|
|
|
list_free_deep(listenChannels);
|
|
|
|
listenChannels = NIL;
|
|
|
|
}
|
|
|
|
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Test whether we are actively listening on the given channel name.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Note: this function is executed for every notification found in the queue.
|
|
|
|
* Perhaps it is worth further optimization, eg convert the list to a sorted
|
|
|
|
* array so we can binary-search it. In practice the list is likely to be
|
|
|
|
* fairly short, though.
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
*/
|
2010-02-16 23:34:57 +01:00
|
|
|
static bool
|
|
|
|
IsListeningOn(const char *channel)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
ListCell *p;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
foreach(p, listenChannels)
|
2001-06-12 07:55:50 +02:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
char *lchan = (char *) lfirst(p);
|
2001-06-12 07:55:50 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
if (strcmp(lchan, channel) == 0)
|
|
|
|
return true;
|
2001-06-12 07:55:50 +02:00
|
|
|
}
|
2010-02-16 23:34:57 +01:00
|
|
|
return false;
|
|
|
|
}
|
2001-06-12 07:55:50 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Remove our entry from the listeners array when we are no longer listening
|
|
|
|
* on any channel. NB: must not fail if we're already not listening.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
asyncQueueUnregister(void)
|
|
|
|
{
|
|
|
|
Assert(listenChannels == NIL); /* else caller error */
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2013-02-13 18:48:05 +01:00
|
|
|
if (!amRegisteredListener) /* nothing to do */
|
|
|
|
return;
|
|
|
|
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
/*
|
|
|
|
* Need exclusive lock here to manipulate list links.
|
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
/* Mark our entry as invalid */
|
2010-02-16 23:34:57 +01:00
|
|
|
QUEUE_BACKEND_PID(MyBackendId) = InvalidPid;
|
2015-10-01 05:32:22 +02:00
|
|
|
QUEUE_BACKEND_DBOID(MyBackendId) = InvalidOid;
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
/* and remove it from the list */
|
|
|
|
if (QUEUE_FIRST_LISTENER == MyBackendId)
|
|
|
|
QUEUE_FIRST_LISTENER = QUEUE_NEXT_LISTENER(MyBackendId);
|
|
|
|
else
|
|
|
|
{
|
|
|
|
for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i))
|
|
|
|
{
|
|
|
|
if (QUEUE_NEXT_LISTENER(i) == MyBackendId)
|
|
|
|
{
|
|
|
|
QUEUE_NEXT_LISTENER(i) = QUEUE_NEXT_LISTENER(MyBackendId);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
QUEUE_NEXT_LISTENER(MyBackendId) = InvalidBackendId;
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
2013-02-13 18:48:05 +01:00
|
|
|
/* mark ourselves as no longer listed in the global array */
|
|
|
|
amRegisteredListener = false;
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
1998-10-06 04:40:09 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Test whether there is room to insert more notification messages.
|
|
|
|
*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* Caller must hold at least shared NotifyQueueLock.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
asyncQueueIsFull(void)
|
|
|
|
{
|
|
|
|
int nexthead;
|
|
|
|
int boundary;
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* The queue is full if creating a new head page would create a page that
|
|
|
|
* logically precedes the current global tail pointer, ie, the head
|
|
|
|
* pointer would wrap around compared to the tail. We cannot create such
|
|
|
|
* a head page for fear of confusing slru.c. For safety we round the tail
|
2021-01-16 21:21:35 +01:00
|
|
|
* pointer back to a segment boundary (truncation logic in
|
|
|
|
* asyncQueueAdvanceTail does not do this, so doing it here is optional).
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
|
|
|
* Note that this test is *not* dependent on how much space there is on
|
|
|
|
* the current head page. This is necessary because asyncQueueAddEntries
|
|
|
|
* might try to create the next head page in any case.
|
|
|
|
*/
|
|
|
|
nexthead = QUEUE_POS_PAGE(QUEUE_HEAD) + 1;
|
|
|
|
if (nexthead > QUEUE_MAX_PAGE)
|
|
|
|
nexthead = 0; /* wrap around */
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
boundary = QUEUE_STOP_PAGE;
|
2010-02-16 23:34:57 +01:00
|
|
|
boundary -= boundary % SLRU_PAGES_PER_SEGMENT;
|
2011-09-28 16:32:38 +02:00
|
|
|
return asyncQueuePagePrecedes(nexthead, boundary);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
2001-06-12 07:55:50 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Advance the QueuePosition to the next entry, assuming that the current
|
|
|
|
* entry is of length entryLength. If we jump to a new page the function
|
|
|
|
* returns true, else false.
|
|
|
|
*/
|
|
|
|
static bool
|
2015-01-26 17:57:33 +01:00
|
|
|
asyncQueueAdvance(volatile QueuePosition *position, int entryLength)
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
|
|
|
int pageno = QUEUE_POS_PAGE(*position);
|
|
|
|
int offset = QUEUE_POS_OFFSET(*position);
|
|
|
|
bool pageJump = false;
|
2000-05-14 05:18:35 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Move to the next writing position: First jump over what we have just
|
|
|
|
* written or read.
|
|
|
|
*/
|
|
|
|
offset += entryLength;
|
|
|
|
Assert(offset <= QUEUE_PAGESIZE);
|
1998-10-06 04:40:09 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* In a second step check if another entry can possibly be written to the
|
|
|
|
* page. If so, stay here, we have reached the next position. If not, then
|
|
|
|
* we need to move on to the next page.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
2010-02-16 23:34:57 +01:00
|
|
|
if (offset + QUEUEALIGN(AsyncQueueEntryEmptySize) > QUEUE_PAGESIZE)
|
1998-10-06 04:40:09 +02:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
pageno++;
|
|
|
|
if (pageno > QUEUE_MAX_PAGE)
|
|
|
|
pageno = 0; /* wrap around */
|
|
|
|
offset = 0;
|
|
|
|
pageJump = true;
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
SET_QUEUE_POS(*position, pageno, offset);
|
|
|
|
return pageJump;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Fill the AsyncQueueEntry at *qe with an outbound notification message.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
static void
|
2010-02-16 23:34:57 +01:00
|
|
|
asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
size_t channellen = n->channel_len;
|
|
|
|
size_t payloadlen = n->payload_len;
|
2010-02-16 23:34:57 +01:00
|
|
|
int entryLength;
|
|
|
|
|
|
|
|
Assert(channellen < NAMEDATALEN);
|
|
|
|
Assert(payloadlen < NOTIFY_PAYLOAD_MAX_LENGTH);
|
|
|
|
|
|
|
|
/* The terminators are already included in AsyncQueueEntryEmptySize */
|
|
|
|
entryLength = AsyncQueueEntryEmptySize + payloadlen + channellen;
|
|
|
|
entryLength = QUEUEALIGN(entryLength);
|
|
|
|
qe->length = entryLength;
|
|
|
|
qe->dboid = MyDatabaseId;
|
|
|
|
qe->xid = GetCurrentTransactionId();
|
|
|
|
qe->srcPid = MyProcPid;
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
memcpy(qe->data, n->data, channellen + payloadlen + 2);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Add pending notifications to the queue.
|
|
|
|
*
|
|
|
|
* We go page by page here, i.e. we stop once we have to go to a new page but
|
|
|
|
* we will be called again and then fill that next page. If an entry does not
|
|
|
|
* fit into the current page, we write a dummy entry with an InvalidOid as the
|
|
|
|
* database OID in order to fill the page. So every page is always used up to
|
|
|
|
* the last byte which simplifies reading the page later.
|
|
|
|
*
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
* We are passed the list cell (in pendingNotifies->events) containing the next
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
* notification to write and return the first still-unwritten cell back.
|
|
|
|
* Eventually we will return NULL indicating all is done.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* We are holding NotifyQueueLock already from the caller and grab
|
|
|
|
* NotifySLRULock locally in this function.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
static ListCell *
|
|
|
|
asyncQueueAddEntries(ListCell *nextNotify)
|
|
|
|
{
|
|
|
|
AsyncQueueEntry qe;
|
2012-06-29 06:51:34 +02:00
|
|
|
QueuePosition queue_head;
|
2010-02-16 23:34:57 +01:00
|
|
|
int pageno;
|
|
|
|
int offset;
|
|
|
|
int slotno;
|
2001-06-12 07:55:50 +02:00
|
|
|
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
|
|
|
|
LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
2012-06-29 06:51:34 +02:00
|
|
|
/*
|
|
|
|
* We work with a local copy of QUEUE_HEAD, which we write back to shared
|
|
|
|
* memory upon exiting. The reason for this is that if we have to advance
|
|
|
|
* to a new page, SimpleLruZeroPage might fail (out of disk space, for
|
|
|
|
* instance), and we must not advance QUEUE_HEAD if it does. (Otherwise,
|
|
|
|
* subsequent insertions would try to put entries into a page that slru.c
|
|
|
|
* thinks doesn't exist yet.) So, use a local position variable. Note
|
|
|
|
* that if we do fail, any already-inserted queue entries are forgotten;
|
|
|
|
* this is okay, since they'd be useless anyway after our transaction
|
|
|
|
* rolls back.
|
|
|
|
*/
|
|
|
|
queue_head = QUEUE_HEAD;
|
|
|
|
|
Fix async.c to not register any SLRU stats counts in the postmaster.
Previously, AsyncShmemInit forcibly initialized the first page of the
async SLRU, to save dealing with that case in asyncQueueAddEntries.
But this is a poor tradeoff, since many installations do not ever use
NOTIFY; for them, expending those cycles in AsyncShmemInit is a
complete waste. Besides, this only saves a couple of instructions
in asyncQueueAddEntries, which hardly seems likely to be measurable.
The real reason to change this now, though, is that now that we track
SLRU access stats, the existing code is causing the postmaster to
accumulate some access counts, which then get inherited into child
processes by fork(), messing up the statistics. Delaying the
initialization into the first child that does a NOTIFY fixes that.
Hence, we can revert f3d23d83e, which was an incorrect attempt at
fixing that issue. Also, add an Assert to pgstat.c that should
catch any future errors of the same sort.
Discussion: https://postgr.es/m/8367.1589391884@sss.pgh.pa.us
2020-05-14 04:48:09 +02:00
|
|
|
/*
|
|
|
|
* If this is the first write since the postmaster started, we need to
|
|
|
|
* initialize the first page of the async SLRU. Otherwise, the current
|
|
|
|
* page should be initialized already, so just fetch it.
|
|
|
|
*
|
|
|
|
* (We could also take the first path when the SLRU position has just
|
|
|
|
* wrapped around, but re-zeroing the page is harmless in that case.)
|
|
|
|
*/
|
2012-06-29 06:51:34 +02:00
|
|
|
pageno = QUEUE_POS_PAGE(queue_head);
|
Fix async.c to not register any SLRU stats counts in the postmaster.
Previously, AsyncShmemInit forcibly initialized the first page of the
async SLRU, to save dealing with that case in asyncQueueAddEntries.
But this is a poor tradeoff, since many installations do not ever use
NOTIFY; for them, expending those cycles in AsyncShmemInit is a
complete waste. Besides, this only saves a couple of instructions
in asyncQueueAddEntries, which hardly seems likely to be measurable.
The real reason to change this now, though, is that now that we track
SLRU access stats, the existing code is causing the postmaster to
accumulate some access counts, which then get inherited into child
processes by fork(), messing up the statistics. Delaying the
initialization into the first child that does a NOTIFY fixes that.
Hence, we can revert f3d23d83e, which was an incorrect attempt at
fixing that issue. Also, add an Assert to pgstat.c that should
catch any future errors of the same sort.
Discussion: https://postgr.es/m/8367.1589391884@sss.pgh.pa.us
2020-05-14 04:48:09 +02:00
|
|
|
if (QUEUE_POS_IS_ZERO(queue_head))
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
slotno = SimpleLruZeroPage(NotifyCtl, pageno);
|
Fix async.c to not register any SLRU stats counts in the postmaster.
Previously, AsyncShmemInit forcibly initialized the first page of the
async SLRU, to save dealing with that case in asyncQueueAddEntries.
But this is a poor tradeoff, since many installations do not ever use
NOTIFY; for them, expending those cycles in AsyncShmemInit is a
complete waste. Besides, this only saves a couple of instructions
in asyncQueueAddEntries, which hardly seems likely to be measurable.
The real reason to change this now, though, is that now that we track
SLRU access stats, the existing code is causing the postmaster to
accumulate some access counts, which then get inherited into child
processes by fork(), messing up the statistics. Delaying the
initialization into the first child that does a NOTIFY fixes that.
Hence, we can revert f3d23d83e, which was an incorrect attempt at
fixing that issue. Also, add an Assert to pgstat.c that should
catch any future errors of the same sort.
Discussion: https://postgr.es/m/8367.1589391884@sss.pgh.pa.us
2020-05-14 04:48:09 +02:00
|
|
|
else
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
slotno = SimpleLruReadPage(NotifyCtl, pageno, true,
|
Fix async.c to not register any SLRU stats counts in the postmaster.
Previously, AsyncShmemInit forcibly initialized the first page of the
async SLRU, to save dealing with that case in asyncQueueAddEntries.
But this is a poor tradeoff, since many installations do not ever use
NOTIFY; for them, expending those cycles in AsyncShmemInit is a
complete waste. Besides, this only saves a couple of instructions
in asyncQueueAddEntries, which hardly seems likely to be measurable.
The real reason to change this now, though, is that now that we track
SLRU access stats, the existing code is causing the postmaster to
accumulate some access counts, which then get inherited into child
processes by fork(), messing up the statistics. Delaying the
initialization into the first child that does a NOTIFY fixes that.
Hence, we can revert f3d23d83e, which was an incorrect attempt at
fixing that issue. Also, add an Assert to pgstat.c that should
catch any future errors of the same sort.
Discussion: https://postgr.es/m/8367.1589391884@sss.pgh.pa.us
2020-05-14 04:48:09 +02:00
|
|
|
InvalidTransactionId);
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* Note we mark the page dirty before writing in it */
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
NotifyCtl->shared->page_dirty[slotno] = true;
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
while (nextNotify != NULL)
|
2000-11-16 23:30:52 +01:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
Notification *n = (Notification *) lfirst(nextNotify);
|
|
|
|
|
|
|
|
/* Construct a valid queue entry in local variable qe */
|
|
|
|
asyncQueueNotificationToEntry(n, &qe);
|
|
|
|
|
2012-06-29 06:51:34 +02:00
|
|
|
offset = QUEUE_POS_OFFSET(queue_head);
|
2001-06-12 07:55:50 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* Check whether the entry really fits on the current page */
|
|
|
|
if (offset + qe.length <= QUEUE_PAGESIZE)
|
2001-06-12 07:55:50 +02:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
/* OK, so advance nextNotify past this item */
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
nextNotify = lnext(pendingNotifies->events, nextNotify);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Write a dummy entry to fill up the page. Actually readers will
|
|
|
|
* only check dboid and since it won't match any reader's database
|
|
|
|
* OID, they will ignore this entry and move on.
|
|
|
|
*/
|
|
|
|
qe.length = QUEUE_PAGESIZE - offset;
|
|
|
|
qe.dboid = InvalidOid;
|
|
|
|
qe.data[0] = '\0'; /* empty channel */
|
|
|
|
qe.data[1] = '\0'; /* empty payload */
|
|
|
|
}
|
2001-10-25 07:50:21 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* Now copy qe into the shared buffer page */
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
memcpy(NotifyCtl->shared->page_buffer[slotno] + offset,
|
2010-02-16 23:34:57 +01:00
|
|
|
&qe,
|
|
|
|
qe.length);
|
|
|
|
|
2012-06-29 06:51:34 +02:00
|
|
|
/* Advance queue_head appropriately, and detect if page is full */
|
|
|
|
if (asyncQueueAdvance(&(queue_head), qe.length))
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
2001-06-12 07:55:50 +02:00
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Page is full, so we're done here, but first fill the next page
|
|
|
|
* with zeroes. The reason to do this is to ensure that slru.c's
|
|
|
|
* idea of the head page is always the same as ours, which avoids
|
|
|
|
* boundary problems in SimpleLruTruncate. The test in
|
|
|
|
* asyncQueueIsFull() ensured that there is room to create this
|
|
|
|
* page without overrunning the queue.
|
2001-06-12 07:55:50 +02:00
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
slotno = SimpleLruZeroPage(NotifyCtl, QUEUE_POS_PAGE(queue_head));
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the new page address is a multiple of QUEUE_CLEANUP_DELAY,
|
|
|
|
* set flag to remember that we should try to advance the tail
|
|
|
|
* pointer (we don't want to actually do that right here).
|
|
|
|
*/
|
|
|
|
if (QUEUE_POS_PAGE(queue_head) % QUEUE_CLEANUP_DELAY == 0)
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
tryAdvanceTail = true;
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* And exit the loop */
|
2001-06-12 07:55:50 +02:00
|
|
|
break;
|
|
|
|
}
|
2000-11-16 23:30:52 +01:00
|
|
|
}
|
2001-06-12 07:55:50 +02:00
|
|
|
|
2012-06-29 06:51:34 +02:00
|
|
|
/* Success, so update the global QUEUE_HEAD */
|
|
|
|
QUEUE_HEAD = queue_head;
|
|
|
|
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifySLRULock);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
return nextNotify;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
1998-08-30 23:05:27 +02:00
|
|
|
/*
|
2015-07-17 15:12:03 +02:00
|
|
|
* SQL function to return the fraction of the notification queue currently
|
|
|
|
* occupied.
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
pg_notification_queue_usage(PG_FUNCTION_ARGS)
|
|
|
|
{
|
|
|
|
double usage;
|
|
|
|
|
Stabilize the results of pg_notification_queue_usage().
This function wasn't touched in commit 51004c717, but that turns out
to be a bad idea, because its results now include any dead space
that exists in the NOTIFY queue on account of our being lazy about
advancing the queue tail. Notably, the isolation tests now fail
if run twice without a server restart between, because async-notify's
first test of the function will already show a positive value.
It seems likely that end users would be equally unhappy about the
result's instability. To fix, just make the function call
asyncQueueAdvanceTail before computing its result. That should end
in producing the same value as before, and it's hard to believe that
there's any practical use-case where pg_notification_queue_usage()
is called so often as to create a performance degradation, especially
compared to what we did before.
Out of paranoia, also mark this function parallel-restricted (it
was volatile, but parallel-safe by default, before). Although the
code seems to work fine when run in a parallel worker, that's outside
the design scope of async.c, and it's a bit scary to have intentional
side-effects happening in a parallel worker. There seems no plausible
use-case where it'd be important to try to parallelize this, so let's
not take any risk of introducing new bugs.
In passing, re-pgindent async.c and run reformat-dat-files on
pg_proc.dat, just because I'm a neatnik.
Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 20:09:33 +01:00
|
|
|
/* Advance the queue tail so we don't report a too-large result */
|
|
|
|
asyncQueueAdvanceTail();
|
|
|
|
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_SHARED);
|
2015-07-17 15:12:03 +02:00
|
|
|
usage = asyncQueueUsage();
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
2015-07-17 15:12:03 +02:00
|
|
|
|
|
|
|
PG_RETURN_FLOAT8(usage);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the fraction of the queue that is currently occupied.
|
1998-08-30 23:05:27 +02:00
|
|
|
*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* The caller must hold NotifyQueueLock in (at least) shared mode.
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
*
|
|
|
|
* Note: we measure the distance to the logical tail page, not the physical
|
|
|
|
* tail page. In some sense that's wrong, but the relative position of the
|
|
|
|
* physical tail is affected by details such as SLRU segment boundaries,
|
|
|
|
* so that a result based on that is unpleasantly unstable.
|
1998-08-30 23:05:27 +02:00
|
|
|
*/
|
2015-07-17 15:12:03 +02:00
|
|
|
static double
|
|
|
|
asyncQueueUsage(void)
|
1998-08-30 23:05:27 +02:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
int headPage = QUEUE_POS_PAGE(QUEUE_HEAD);
|
|
|
|
int tailPage = QUEUE_POS_PAGE(QUEUE_TAIL);
|
|
|
|
int occupied;
|
|
|
|
|
|
|
|
occupied = headPage - tailPage;
|
|
|
|
|
|
|
|
if (occupied == 0)
|
2015-07-17 15:12:03 +02:00
|
|
|
return (double) 0; /* fast exit for common case */
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
if (occupied < 0)
|
|
|
|
{
|
|
|
|
/* head has wrapped around, tail not yet */
|
|
|
|
occupied += QUEUE_MAX_PAGE + 1;
|
|
|
|
}
|
1998-08-30 23:05:27 +02:00
|
|
|
|
2015-07-17 15:12:03 +02:00
|
|
|
return (double) occupied / (double) ((QUEUE_MAX_PAGE + 1) / 2);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check whether the queue is at least half full, and emit a warning if so.
|
|
|
|
*
|
|
|
|
* This is unlikely given the size of the queue, but possible.
|
|
|
|
* The warnings show up at most once every QUEUE_FULL_WARN_INTERVAL.
|
|
|
|
*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* Caller must hold exclusive NotifyQueueLock.
|
2015-07-17 15:12:03 +02:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
asyncQueueFillWarning(void)
|
|
|
|
{
|
|
|
|
double fillDegree;
|
|
|
|
TimestampTz t;
|
2010-02-16 23:34:57 +01:00
|
|
|
|
2015-07-17 15:12:03 +02:00
|
|
|
fillDegree = asyncQueueUsage();
|
2010-02-16 23:34:57 +01:00
|
|
|
if (fillDegree < 0.5)
|
|
|
|
return;
|
|
|
|
|
|
|
|
t = GetCurrentTimestamp();
|
1998-10-06 04:40:09 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
if (TimestampDifferenceExceeds(asyncQueueControl->lastQueueFillWarn,
|
|
|
|
t, QUEUE_FULL_WARN_INTERVAL))
|
|
|
|
{
|
|
|
|
QueuePosition min = QUEUE_HEAD;
|
|
|
|
int32 minPid = InvalidPid;
|
|
|
|
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i))
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
Assert(QUEUE_BACKEND_PID(i) != InvalidPid);
|
|
|
|
min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i));
|
|
|
|
if (QUEUE_POS_EQUAL(min, QUEUE_BACKEND_POS(i)))
|
|
|
|
minPid = QUEUE_BACKEND_PID(i);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
1998-08-30 23:05:27 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
ereport(WARNING,
|
2010-04-05 02:42:24 +02:00
|
|
|
(errmsg("NOTIFY queue is %.0f%% full", fillDegree * 100),
|
2010-02-16 23:34:57 +01:00
|
|
|
(minPid != InvalidPid ?
|
2010-04-05 02:42:24 +02:00
|
|
|
errdetail("The server process with PID %d is among those with the oldest transactions.", minPid)
|
2010-02-16 23:34:57 +01:00
|
|
|
: 0),
|
|
|
|
(minPid != InvalidPid ?
|
2010-04-05 02:42:24 +02:00
|
|
|
errhint("The NOTIFY queue cannot be emptied until that process ends its current transaction.")
|
2010-02-16 23:34:57 +01:00
|
|
|
: 0)));
|
1998-10-06 04:40:09 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
asyncQueueControl->lastQueueFillWarn = t;
|
|
|
|
}
|
1998-08-30 23:05:27 +02:00
|
|
|
}
|
1998-09-01 06:40:42 +02:00
|
|
|
|
1998-08-30 23:05:27 +02:00
|
|
|
/*
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
* Send signals to listening backends.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
* Normally we signal only backends in our own database, since only those
|
|
|
|
* backends could be interested in notifies we send. However, if there's
|
|
|
|
* notify traffic in our database but no traffic in another database that
|
|
|
|
* does have listener(s), those listeners will fall further and further
|
|
|
|
* behind. Waken them anyway if they're far enough behind, so that they'll
|
|
|
|
* advance their queue position pointers, allowing the global tail to advance.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
2020-06-07 15:06:51 +02:00
|
|
|
* Since we know the BackendId and the Pid the signaling is quite cheap.
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
*
|
|
|
|
* This is called during CommitTransaction(), so it's important for it
|
|
|
|
* to have very low probability of failure.
|
1998-08-30 23:05:27 +02:00
|
|
|
*/
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
static void
|
2010-02-16 23:34:57 +01:00
|
|
|
SignalBackends(void)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
int32 *pids;
|
|
|
|
BackendId *ids;
|
|
|
|
int count;
|
1998-08-30 23:05:27 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
* Identify backends that we need to signal. We don't want to send
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* signals while holding the NotifyQueueLock, so this loop just builds a
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
* list of target PIDs.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
|
|
|
* XXX in principle these pallocs could fail, which would be bad. Maybe
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
* preallocate the arrays? They're not that large, though.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
pids = (int32 *) palloc(MaxBackends * sizeof(int32));
|
|
|
|
ids = (BackendId *) palloc(MaxBackends * sizeof(BackendId));
|
|
|
|
count = 0;
|
2000-04-12 19:17:23 +02:00
|
|
|
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i))
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
int32 pid = QUEUE_BACKEND_PID(i);
|
|
|
|
QueuePosition pos;
|
|
|
|
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
Assert(pid != InvalidPid);
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
pos = QUEUE_BACKEND_POS(i);
|
|
|
|
if (QUEUE_BACKEND_DBOID(i) == MyDatabaseId)
|
2001-06-12 07:55:50 +02:00
|
|
|
{
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
/*
|
|
|
|
* Always signal listeners in our own database, unless they're
|
|
|
|
* already caught up (unlikely, but possible).
|
|
|
|
*/
|
|
|
|
if (QUEUE_POS_EQUAL(pos, QUEUE_HEAD))
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Listeners in other databases should be signaled only if they
|
|
|
|
* are far behind.
|
|
|
|
*/
|
|
|
|
if (asyncQueuePageDiff(QUEUE_POS_PAGE(QUEUE_HEAD),
|
|
|
|
QUEUE_POS_PAGE(pos)) < QUEUE_CLEANUP_DELAY)
|
|
|
|
continue;
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
/* OK, need to signal this one */
|
|
|
|
pids[count] = pid;
|
|
|
|
ids[count] = i;
|
|
|
|
count++;
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
1998-10-06 04:40:09 +02:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* Now send signals */
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
for (int i = 0; i < count; i++)
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present. This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date. It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies. We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.
Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly. Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless. This change can
substantially reduce signal traffic if listeners are spread among
many databases. It won't help for the common case of only a single
active database, but the extra check costs very little.
Martijn van Oosterhout, with some adjustments by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 17:46:29 +02:00
|
|
|
int32 pid = pids[i];
|
2010-02-16 23:34:57 +01:00
|
|
|
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
/*
|
|
|
|
* If we are signaling our own process, no need to involve the kernel;
|
|
|
|
* just set the flag directly.
|
|
|
|
*/
|
|
|
|
if (pid == MyProcPid)
|
|
|
|
{
|
|
|
|
notifyInterruptPending = true;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Note: assuming things aren't broken, a signal failure here could
|
|
|
|
* only occur if the target backend exited since we released
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* NotifyQueueLock; which is unlikely but certainly possible. So we
|
2010-02-16 23:34:57 +01:00
|
|
|
* just log a low-level debug message if it happens.
|
|
|
|
*/
|
|
|
|
if (SendProcSignal(pid, PROCSIG_NOTIFY_INTERRUPT, ids[i]) < 0)
|
|
|
|
elog(DEBUG3, "could not signal backend with PID %d: %m", pid);
|
|
|
|
}
|
|
|
|
|
|
|
|
pfree(pids);
|
|
|
|
pfree(ids);
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* AtAbort_Notify
|
1998-10-06 04:40:09 +02:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* This is called at transaction abort.
|
1998-10-06 04:40:09 +02:00
|
|
|
*
|
2010-02-16 23:34:57 +01:00
|
|
|
* Gets rid of pending actions and outbound notifies that we would have
|
|
|
|
* executed if the transaction got committed.
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
|
|
|
void
|
2001-06-18 00:27:15 +02:00
|
|
|
AtAbort_Notify(void)
|
1998-10-06 04:40:09 +02:00
|
|
|
{
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
2013-02-13 18:48:05 +01:00
|
|
|
* If we LISTEN but then roll back the transaction after PreCommit_Notify,
|
|
|
|
* we have registered as a listener but have not made any entry in
|
|
|
|
* listenChannels. In that case, deregister again.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
2013-02-13 18:48:05 +01:00
|
|
|
if (amRegisteredListener && listenChannels == NIL)
|
|
|
|
asyncQueueUnregister();
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
/* And clean up */
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
ClearPendingActionsAndNotifies();
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
|
|
|
|
2004-07-01 02:52:04 +02:00
|
|
|
/*
|
|
|
|
* AtSubCommit_Notify() --- Take care of subtransaction commit.
|
|
|
|
*
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
* Reassign all items in the pending lists to the parent transaction.
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
AtSubCommit_Notify(void)
|
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
int my_level = GetCurrentTransactionNestLevel();
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
/* If there are actions at our nesting level, we must reparent them. */
|
|
|
|
if (pendingActions != NULL &&
|
|
|
|
pendingActions->nestingLevel >= my_level)
|
|
|
|
{
|
|
|
|
if (pendingActions->upper == NULL ||
|
|
|
|
pendingActions->upper->nestingLevel < my_level - 1)
|
|
|
|
{
|
|
|
|
/* nothing to merge; give the whole thing to the parent */
|
|
|
|
--pendingActions->nestingLevel;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
ActionList *childPendingActions = pendingActions;
|
2004-07-01 02:52:04 +02:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
pendingActions = pendingActions->upper;
|
2004-09-07 01:33:48 +02:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
/*
|
|
|
|
* Mustn't try to eliminate duplicates here --- see queue_listen()
|
|
|
|
*/
|
|
|
|
pendingActions->actions =
|
|
|
|
list_concat(pendingActions->actions,
|
|
|
|
childPendingActions->actions);
|
|
|
|
pfree(childPendingActions);
|
|
|
|
}
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
}
|
2019-10-04 14:19:25 +02:00
|
|
|
|
|
|
|
/* If there are notifies at our nesting level, we must reparent them. */
|
|
|
|
if (pendingNotifies != NULL &&
|
|
|
|
pendingNotifies->nestingLevel >= my_level)
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
Assert(pendingNotifies->nestingLevel == my_level);
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
if (pendingNotifies->upper == NULL ||
|
|
|
|
pendingNotifies->upper->nestingLevel < my_level - 1)
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
/* nothing to merge; give the whole thing to the parent */
|
|
|
|
--pendingNotifies->nestingLevel;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Formerly, we didn't bother to eliminate duplicates here, but
|
|
|
|
* now we must, else we fall foul of "Assert(!found)", either here
|
|
|
|
* or during a later attempt to build the parent-level hashtable.
|
|
|
|
*/
|
|
|
|
NotificationList *childPendingNotifies = pendingNotifies;
|
|
|
|
ListCell *l;
|
|
|
|
|
|
|
|
pendingNotifies = pendingNotifies->upper;
|
|
|
|
/* Insert all the subxact's events into parent, except for dups */
|
|
|
|
foreach(l, childPendingNotifies->events)
|
|
|
|
{
|
|
|
|
Notification *childn = (Notification *) lfirst(l);
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
if (!AsyncExistsPendingNotify(childn))
|
|
|
|
AddEventToPendingNotifies(childn);
|
|
|
|
}
|
|
|
|
pfree(childPendingNotifies);
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
}
|
|
|
|
}
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* AtSubAbort_Notify() --- Take care of subtransaction abort.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
AtSubAbort_Notify(void)
|
|
|
|
{
|
2004-09-07 01:33:48 +02:00
|
|
|
int my_level = GetCurrentTransactionNestLevel();
|
|
|
|
|
2004-07-01 02:52:04 +02:00
|
|
|
/*
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
* All we have to do is pop the stack --- the actions/notifies made in
|
2004-07-01 02:52:04 +02:00
|
|
|
* this subxact are no longer interesting, and the space will be freed
|
2019-10-04 14:19:25 +02:00
|
|
|
* when CurTransactionContext is recycled. We still have to free the
|
|
|
|
* ActionList and NotificationList objects themselves, though, because
|
|
|
|
* those are allocated in TopTransactionContext.
|
2004-09-07 01:33:48 +02:00
|
|
|
*
|
2019-10-04 14:19:25 +02:00
|
|
|
* Note that there might be no entries at all, or no entries for the
|
Stabilize the results of pg_notification_queue_usage().
This function wasn't touched in commit 51004c717, but that turns out
to be a bad idea, because its results now include any dead space
that exists in the NOTIFY queue on account of our being lazy about
advancing the queue tail. Notably, the isolation tests now fail
if run twice without a server restart between, because async-notify's
first test of the function will already show a positive value.
It seems likely that end users would be equally unhappy about the
result's instability. To fix, just make the function call
asyncQueueAdvanceTail before computing its result. That should end
in producing the same value as before, and it's hard to believe that
there's any practical use-case where pg_notification_queue_usage()
is called so often as to create a performance degradation, especially
compared to what we did before.
Out of paranoia, also mark this function parallel-restricted (it
was volatile, but parallel-safe by default, before). Although the
code seems to work fine when run in a parallel worker, that's outside
the design scope of async.c, and it's a bit scary to have intentional
side-effects happening in a parallel worker. There seems no plausible
use-case where it'd be important to try to parallelize this, so let's
not take any risk of introducing new bugs.
In passing, re-pgindent async.c and run reformat-dat-files on
pg_proc.dat, just because I'm a neatnik.
Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 20:09:33 +01:00
|
|
|
* current subtransaction level, either because none were ever created, or
|
|
|
|
* because we reentered this routine due to trouble during subxact abort.
|
2004-07-01 02:52:04 +02:00
|
|
|
*/
|
2019-10-04 14:19:25 +02:00
|
|
|
while (pendingActions != NULL &&
|
|
|
|
pendingActions->nestingLevel >= my_level)
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
ActionList *childPendingActions = pendingActions;
|
|
|
|
|
|
|
|
pendingActions = pendingActions->upper;
|
|
|
|
pfree(childPendingActions);
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
}
|
|
|
|
|
2019-10-04 14:19:25 +02:00
|
|
|
while (pendingNotifies != NULL &&
|
|
|
|
pendingNotifies->nestingLevel >= my_level)
|
2004-09-07 01:33:48 +02:00
|
|
|
{
|
2019-10-04 14:19:25 +02:00
|
|
|
NotificationList *childPendingNotifies = pendingNotifies;
|
|
|
|
|
|
|
|
pendingNotifies = pendingNotifies->upper;
|
|
|
|
pfree(childPendingNotifies);
|
2004-09-07 01:33:48 +02:00
|
|
|
}
|
2004-07-01 02:52:04 +02:00
|
|
|
}
|
|
|
|
|
1998-10-06 04:40:09 +02:00
|
|
|
/*
|
2009-07-31 22:26:23 +02:00
|
|
|
* HandleNotifyInterrupt
|
1998-10-06 04:40:09 +02:00
|
|
|
*
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
* Signal handler portion of interrupt handling. Let the backend know
|
|
|
|
* that there's a pending notify interrupt. If we're currently reading
|
|
|
|
* from the client, this will interrupt the read and
|
|
|
|
* ProcessClientReadInterrupt() will call ProcessNotifyInterrupt().
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
|
|
|
void
|
2009-07-31 22:26:23 +02:00
|
|
|
HandleNotifyInterrupt(void)
|
1998-10-06 04:40:09 +02:00
|
|
|
{
|
|
|
|
/*
|
2009-07-31 22:26:23 +02:00
|
|
|
* Note: this is called by a SIGNAL HANDLER. You must be very wary what
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
* you do here.
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
|
|
|
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
/* signal that work needs to be done */
|
|
|
|
notifyInterruptPending = true;
|
2003-02-18 03:53:29 +01:00
|
|
|
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
/* make sure the event is processed in due course */
|
|
|
|
SetLatch(MyLatch);
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
* ProcessNotifyInterrupt
|
1998-10-06 04:40:09 +02:00
|
|
|
*
|
2019-11-24 20:42:59 +01:00
|
|
|
* This is called if we see notifyInterruptPending set, just before
|
|
|
|
* transmitting ReadyForQuery at the end of a frontend command, and
|
|
|
|
* also if a notify signal occurs while reading from the frontend.
|
|
|
|
* HandleNotifyInterrupt() will cause the read to be interrupted
|
|
|
|
* via the process's latch, and this routine will get called.
|
|
|
|
* If we are truly idle (ie, *not* inside a transaction block),
|
|
|
|
* process the incoming notifies.
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
*
|
|
|
|
* If "flush" is true, force any frontend messages out immediately.
|
|
|
|
* This can be false when being called at the end of a frontend command,
|
|
|
|
* since we'll flush after sending ReadyForQuery.
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
|
|
|
void
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
ProcessNotifyInterrupt(bool flush)
|
1998-10-06 04:40:09 +02:00
|
|
|
{
|
2003-10-16 18:50:41 +02:00
|
|
|
if (IsTransactionOrTransactionBlock())
|
1998-10-06 04:40:09 +02:00
|
|
|
return; /* not really idle */
|
|
|
|
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
/* Loop in case another signal arrives while sending messages */
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
while (notifyInterruptPending)
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
ProcessIncomingNotify(flush);
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/*
|
|
|
|
* Read all pending notifications from the queue, and deliver appropriate
|
|
|
|
* ones to my frontend. Stop when we reach queue head or an uncommitted
|
|
|
|
* notification.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
asyncQueueReadAllNotifications(void)
|
|
|
|
{
|
2015-01-26 17:57:33 +01:00
|
|
|
volatile QueuePosition pos;
|
2010-02-16 23:34:57 +01:00
|
|
|
QueuePosition head;
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
Snapshot snapshot;
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2010-02-16 23:34:57 +01:00
|
|
|
/* page_buffer must be adequately aligned, so use a union */
|
|
|
|
union
|
|
|
|
{
|
|
|
|
char buf[QUEUE_PAGESIZE];
|
|
|
|
AsyncQueueEntry align;
|
|
|
|
} page_buffer;
|
|
|
|
|
|
|
|
/* Fetch current state */
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_SHARED);
|
2010-02-16 23:34:57 +01:00
|
|
|
/* Assert checks that we have a valid state entry */
|
|
|
|
Assert(MyProcPid == QUEUE_BACKEND_PID(MyBackendId));
|
2020-09-05 19:17:32 +02:00
|
|
|
pos = QUEUE_BACKEND_POS(MyBackendId);
|
2010-02-16 23:34:57 +01:00
|
|
|
head = QUEUE_HEAD;
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
if (QUEUE_POS_EQUAL(pos, head))
|
|
|
|
{
|
|
|
|
/* Nothing to do, we have read all notifications already. */
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*----------
|
2019-11-25 00:03:39 +01:00
|
|
|
* Get snapshot we'll use to decide which xacts are still in progress.
|
|
|
|
* This is trickier than it might seem, because of race conditions.
|
2010-02-16 23:34:57 +01:00
|
|
|
* Consider the following example:
|
|
|
|
*
|
|
|
|
* Backend 1: Backend 2:
|
|
|
|
*
|
|
|
|
* transaction starts
|
2019-11-25 00:03:39 +01:00
|
|
|
* UPDATE foo SET ...;
|
2010-02-16 23:34:57 +01:00
|
|
|
* NOTIFY foo;
|
|
|
|
* commit starts
|
2019-11-25 00:03:39 +01:00
|
|
|
* queue the notify message
|
2010-02-16 23:34:57 +01:00
|
|
|
* transaction starts
|
2019-11-25 00:03:39 +01:00
|
|
|
* LISTEN foo; -- first LISTEN in session
|
|
|
|
* SELECT * FROM foo WHERE ...;
|
2010-02-16 23:34:57 +01:00
|
|
|
* commit to clog
|
2019-11-25 00:03:39 +01:00
|
|
|
* commit starts
|
|
|
|
* add backend 2 to array of listeners
|
|
|
|
* advance to queue head (this code)
|
2010-02-16 23:34:57 +01:00
|
|
|
* commit to clog
|
|
|
|
*
|
2019-11-25 00:03:39 +01:00
|
|
|
* Transaction 2's SELECT has not seen the UPDATE's effects, since that
|
|
|
|
* wasn't committed yet. Ideally we'd ensure that client 2 would
|
|
|
|
* eventually get transaction 1's notify message, but there's no way
|
|
|
|
* to do that; until we're in the listener array, there's no guarantee
|
|
|
|
* that the notify message doesn't get removed from the queue.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
2019-11-25 00:03:39 +01:00
|
|
|
* Therefore the coding technique transaction 2 is using is unsafe:
|
|
|
|
* applications must commit a LISTEN before inspecting database state,
|
|
|
|
* if they want to ensure they will see notifications about subsequent
|
|
|
|
* changes to that state.
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
2019-11-25 00:03:39 +01:00
|
|
|
* What we do guarantee is that we'll see all notifications from
|
|
|
|
* transactions committing after the snapshot we take here.
|
|
|
|
* Exec_ListenPreCommit has already added us to the listener array,
|
|
|
|
* so no not-yet-committed messages can be removed from the queue
|
|
|
|
* before we see them.
|
2010-02-16 23:34:57 +01:00
|
|
|
*----------
|
|
|
|
*/
|
2019-11-25 00:03:39 +01:00
|
|
|
snapshot = RegisterSnapshot(GetLatestSnapshot());
|
|
|
|
|
|
|
|
/*
|
|
|
|
* It is possible that we fail while trying to send a message to our
|
|
|
|
* frontend (for example, because of encoding conversion failure). If
|
|
|
|
* that happens it is critical that we not try to send the same message
|
|
|
|
* over and over again. Therefore, we place a PG_TRY block here that will
|
|
|
|
* forcibly advance our queue position before we lose control to an error.
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* (We could alternatively retake NotifyQueueLock and move the position
|
2019-11-25 00:03:39 +01:00
|
|
|
* before handling each individual message, but that seems like too much
|
|
|
|
* lock traffic.)
|
|
|
|
*/
|
2010-02-16 23:34:57 +01:00
|
|
|
PG_TRY();
|
|
|
|
{
|
|
|
|
bool reachedStop;
|
|
|
|
|
|
|
|
do
|
|
|
|
{
|
|
|
|
int curpage = QUEUE_POS_PAGE(pos);
|
|
|
|
int curoffset = QUEUE_POS_OFFSET(pos);
|
|
|
|
int slotno;
|
|
|
|
int copysize;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We copy the data from SLRU into a local buffer, so as to avoid
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* holding the NotifySLRULock while we are examining the entries
|
|
|
|
* and possibly transmitting them to our frontend. Copy only the
|
|
|
|
* part of the page we will actually inspect.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
|
2010-02-16 23:34:57 +01:00
|
|
|
InvalidTransactionId);
|
|
|
|
if (curpage == QUEUE_POS_PAGE(head))
|
|
|
|
{
|
|
|
|
/* we only want to read as far as head */
|
|
|
|
copysize = QUEUE_POS_OFFSET(head) - curoffset;
|
|
|
|
if (copysize < 0)
|
|
|
|
copysize = 0; /* just for safety */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* fetch all the rest of the page */
|
|
|
|
copysize = QUEUE_PAGESIZE - curoffset;
|
|
|
|
}
|
|
|
|
memcpy(page_buffer.buf + curoffset,
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
NotifyCtl->shared->page_buffer[slotno] + curoffset,
|
2010-02-16 23:34:57 +01:00
|
|
|
copysize);
|
|
|
|
/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifySLRULock);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Process messages up to the stop position, end of page, or an
|
|
|
|
* uncommitted message.
|
|
|
|
*
|
|
|
|
* Our stop position is what we found to be the head's position
|
|
|
|
* when we entered this function. It might have changed already.
|
|
|
|
* But if it has, we will receive (or have already received and
|
|
|
|
* queued) another signal and come here again.
|
|
|
|
*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* We are not holding NotifyQueueLock here! The queue can only
|
2010-02-16 23:34:57 +01:00
|
|
|
* extend beyond the head pointer (see above) and we leave our
|
|
|
|
* backend's pointer where it is so nobody will truncate or
|
|
|
|
* rewrite pages under us. Especially we don't want to hold a lock
|
|
|
|
* while sending the notifications to the frontend.
|
|
|
|
*/
|
|
|
|
reachedStop = asyncQueueProcessPageEntries(&pos, head,
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
page_buffer.buf,
|
|
|
|
snapshot);
|
2010-02-16 23:34:57 +01:00
|
|
|
} while (!reachedStop);
|
|
|
|
}
|
2019-11-01 11:09:52 +01:00
|
|
|
PG_FINALLY();
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
|
|
|
/* Update shared state */
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_SHARED);
|
2010-02-16 23:34:57 +01:00
|
|
|
QUEUE_BACKEND_POS(MyBackendId) = pos;
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
PG_END_TRY();
|
|
|
|
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
/* Done with snapshot */
|
|
|
|
UnregisterSnapshot(snapshot);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fetch notifications from the shared queue, beginning at position current,
|
|
|
|
* and deliver relevant ones to my frontend.
|
|
|
|
*
|
|
|
|
* The current page must have been fetched into page_buffer from shared
|
|
|
|
* memory. (We could access the page right in shared memory, but that
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* would imply holding the NotifySLRULock throughout this routine.)
|
2010-02-16 23:34:57 +01:00
|
|
|
*
|
|
|
|
* We stop if we reach the "stop" position, or reach a notification from an
|
|
|
|
* uncommitted transaction, or reach the end of the page.
|
|
|
|
*
|
|
|
|
* The function returns true once we have reached the stop position or an
|
|
|
|
* uncommitted notification, and false if we have finished with the page.
|
|
|
|
* In other words: once it returns true there is no need to look further.
|
2010-02-17 17:54:06 +01:00
|
|
|
* The QueuePosition *current is advanced past all processed messages.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
static bool
|
2015-01-26 17:57:33 +01:00
|
|
|
asyncQueueProcessPageEntries(volatile QueuePosition *current,
|
2010-02-16 23:34:57 +01:00
|
|
|
QueuePosition stop,
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
char *page_buffer,
|
|
|
|
Snapshot snapshot)
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
|
|
|
bool reachedStop = false;
|
|
|
|
bool reachedEndOfPage;
|
|
|
|
AsyncQueueEntry *qe;
|
|
|
|
|
|
|
|
do
|
|
|
|
{
|
2010-02-17 17:54:06 +01:00
|
|
|
QueuePosition thisentry = *current;
|
|
|
|
|
|
|
|
if (QUEUE_POS_EQUAL(thisentry, stop))
|
2010-02-16 23:34:57 +01:00
|
|
|
break;
|
|
|
|
|
2010-02-17 17:54:06 +01:00
|
|
|
qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Advance *current over this message, possibly to the next page. As
|
|
|
|
* noted in the comments for asyncQueueReadAllNotifications, we must
|
|
|
|
* do this before possibly failing while processing the message.
|
|
|
|
*/
|
|
|
|
reachedEndOfPage = asyncQueueAdvance(current, qe->length);
|
|
|
|
|
|
|
|
/* Ignore messages destined for other databases */
|
|
|
|
if (qe->dboid == MyDatabaseId)
|
|
|
|
{
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
if (XidInMVCCSnapshot(qe->xid, snapshot))
|
Avoid transaction-commit race condition while receiving a NOTIFY message.
Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
whether a NOTIFY message's originating transaction is in progress,
committed, or aborted. The previous coding could accept a message from a
transaction that was still in-progress according to the PGPROC array;
if the client were fast enough at starting a new transaction, it might fail
to see table rows added/updated by the message-sending transaction. Which
of course would usually be the point of receiving the message. We noted
this type of race condition long ago in tqual.c, but async.c overlooked it.
The race condition probably cannot occur unless there are multiple NOTIFY
senders in action, since an individual backend doesn't send NOTIFY signals
until well after it's done committing. But if two senders commit in close
succession, it's certainly possible that we could see the second sender's
message within the race condition window while responding to the signal
from the first one.
Per bug #9557 from Marko Tiikkaja. This patch is slightly more invasive
than what he proposed, since it removes the now-redundant
TransactionIdDidAbort call.
Back-patch to 9.0, where the current NOTIFY implementation was introduced.
2014-03-13 17:02:54 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The source transaction is still in progress, so we can't
|
|
|
|
* process this message yet. Break out of the loop, but first
|
|
|
|
* back up *current so we will reprocess the message next
|
|
|
|
* time. (Note: it is unlikely but not impossible for
|
|
|
|
* TransactionIdDidCommit to fail, so we can't really avoid
|
|
|
|
* this advance-then-back-up behavior when dealing with an
|
|
|
|
* uncommitted message.)
|
|
|
|
*
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
* Note that we must test XidInMVCCSnapshot before we test
|
|
|
|
* TransactionIdDidCommit, else we might return a message from
|
|
|
|
* a transaction that is not yet visible to snapshots; compare
|
2019-01-22 02:03:15 +01:00
|
|
|
* the comments at the head of heapam_visibility.c.
|
Fix low-probability loss of NOTIFY messages due to XID wraparound.
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
2017-10-11 20:28:33 +02:00
|
|
|
*
|
|
|
|
* Also, while our own xact won't be listed in the snapshot,
|
|
|
|
* we need not check for TransactionIdIsCurrentTransactionId
|
|
|
|
* because our transaction cannot (yet) have queued any
|
|
|
|
* messages.
|
Avoid transaction-commit race condition while receiving a NOTIFY message.
Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
whether a NOTIFY message's originating transaction is in progress,
committed, or aborted. The previous coding could accept a message from a
transaction that was still in-progress according to the PGPROC array;
if the client were fast enough at starting a new transaction, it might fail
to see table rows added/updated by the message-sending transaction. Which
of course would usually be the point of receiving the message. We noted
this type of race condition long ago in tqual.c, but async.c overlooked it.
The race condition probably cannot occur unless there are multiple NOTIFY
senders in action, since an individual backend doesn't send NOTIFY signals
until well after it's done committing. But if two senders commit in close
succession, it's certainly possible that we could see the second sender's
message within the race condition window while responding to the signal
from the first one.
Per bug #9557 from Marko Tiikkaja. This patch is slightly more invasive
than what he proposed, since it removes the now-redundant
TransactionIdDidAbort call.
Back-patch to 9.0, where the current NOTIFY implementation was introduced.
2014-03-13 17:02:54 +01:00
|
|
|
*/
|
|
|
|
*current = thisentry;
|
|
|
|
reachedStop = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
else if (TransactionIdDidCommit(qe->xid))
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
|
|
|
/* qe->data is the null-terminated channel name */
|
|
|
|
char *channel = qe->data;
|
|
|
|
|
|
|
|
if (IsListeningOn(channel))
|
|
|
|
{
|
|
|
|
/* payload follows channel name */
|
|
|
|
char *payload = qe->data + strlen(channel) + 1;
|
|
|
|
|
|
|
|
NotifyMyFrontEnd(channel, payload, qe->srcPid);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
Avoid transaction-commit race condition while receiving a NOTIFY message.
Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish
whether a NOTIFY message's originating transaction is in progress,
committed, or aborted. The previous coding could accept a message from a
transaction that was still in-progress according to the PGPROC array;
if the client were fast enough at starting a new transaction, it might fail
to see table rows added/updated by the message-sending transaction. Which
of course would usually be the point of receiving the message. We noted
this type of race condition long ago in tqual.c, but async.c overlooked it.
The race condition probably cannot occur unless there are multiple NOTIFY
senders in action, since an individual backend doesn't send NOTIFY signals
until well after it's done committing. But if two senders commit in close
succession, it's certainly possible that we could see the second sender's
message within the race condition window while responding to the signal
from the first one.
Per bug #9557 from Marko Tiikkaja. This patch is slightly more invasive
than what he proposed, since it removes the now-redundant
TransactionIdDidAbort call.
Back-patch to 9.0, where the current NOTIFY implementation was introduced.
2014-03-13 17:02:54 +01:00
|
|
|
* The source transaction aborted or crashed, so we just
|
|
|
|
* ignore its notifications.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Loop back if we're not at end of page */
|
|
|
|
} while (!reachedEndOfPage);
|
|
|
|
|
|
|
|
if (QUEUE_POS_EQUAL(*current, stop))
|
|
|
|
reachedStop = true;
|
|
|
|
|
|
|
|
return reachedStop;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Advance the shared queue tail variable to the minimum of all the
|
|
|
|
* per-backend tail pointers. Truncate pg_notify space if possible.
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
*
|
|
|
|
* This is (usually) called during CommitTransaction(), so it's important for
|
|
|
|
* it to have very low probability of failure.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
asyncQueueAdvanceTail(void)
|
|
|
|
{
|
|
|
|
QueuePosition min;
|
|
|
|
int oldtailpage;
|
|
|
|
int newtailpage;
|
|
|
|
int boundary;
|
|
|
|
|
2020-08-15 19:15:53 +02:00
|
|
|
/* Restrict task to one backend per cluster; see SimpleLruTruncate(). */
|
|
|
|
LWLockAcquire(NotifyQueueTailLock, LW_EXCLUSIVE);
|
|
|
|
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
/*
|
|
|
|
* Compute the new tail. Pre-v13, it's essential that QUEUE_TAIL be exact
|
|
|
|
* (ie, exactly match at least one backend's queue position), so it must
|
|
|
|
* be updated atomically with the actual computation. Since v13, we could
|
|
|
|
* get away with not doing it like that, but it seems prudent to keep it
|
|
|
|
* so.
|
|
|
|
*
|
|
|
|
* Also, because incoming backends will scan forward from QUEUE_TAIL, that
|
|
|
|
* must be advanced before we can truncate any data. Thus, QUEUE_TAIL is
|
|
|
|
* the logical tail, while QUEUE_STOP_PAGE is the physical tail, or oldest
|
|
|
|
* un-truncated page. When QUEUE_STOP_PAGE != QUEUE_POS_PAGE(QUEUE_TAIL),
|
|
|
|
* there are pages we can truncate but haven't yet finished doing so.
|
|
|
|
*
|
|
|
|
* For concurrency's sake, we don't want to hold NotifyQueueLock while
|
|
|
|
* performing SimpleLruTruncate. This is OK because no backend will try
|
|
|
|
* to access the pages we are in the midst of truncating.
|
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
|
2010-02-16 23:34:57 +01:00
|
|
|
min = QUEUE_HEAD;
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i))
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends. That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.
A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends. We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.
In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends. If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.
Martijn van Oosterhout, hacked a bit more by me
Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-11 00:15:17 +02:00
|
|
|
Assert(QUEUE_BACKEND_PID(i) != InvalidPid);
|
|
|
|
min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i));
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
QUEUE_TAIL = min;
|
|
|
|
oldtailpage = QUEUE_STOP_PAGE;
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
LWLockRelease(NotifyQueueLock);
|
2010-02-16 23:34:57 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We can truncate something if the global tail advanced across an SLRU
|
|
|
|
* segment boundary.
|
|
|
|
*
|
|
|
|
* XXX it might be better to truncate only once every several segments, to
|
|
|
|
* reduce the number of directory scans.
|
|
|
|
*/
|
|
|
|
newtailpage = QUEUE_POS_PAGE(min);
|
|
|
|
boundary = newtailpage - (newtailpage % SLRU_PAGES_PER_SEGMENT);
|
2011-09-28 16:32:38 +02:00
|
|
|
if (asyncQueuePagePrecedes(oldtailpage, boundary))
|
2010-02-16 23:34:57 +01:00
|
|
|
{
|
|
|
|
/*
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
* SimpleLruTruncate() will ask for NotifySLRULock but will also
|
|
|
|
* release the lock again.
|
2010-02-16 23:34:57 +01:00
|
|
|
*/
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
SimpleLruTruncate(NotifyCtl, newtailpage);
|
2020-08-15 19:15:53 +02:00
|
|
|
|
Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 20:03:40 +01:00
|
|
|
/*
|
|
|
|
* Update QUEUE_STOP_PAGE. This changes asyncQueueIsFull()'s verdict
|
|
|
|
* for the segment immediately prior to the old tail, allowing fresh
|
|
|
|
* data into that segment.
|
|
|
|
*/
|
|
|
|
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
|
|
|
|
QUEUE_STOP_PAGE = newtailpage;
|
|
|
|
LWLockRelease(NotifyQueueLock);
|
|
|
|
}
|
2020-08-15 19:15:53 +02:00
|
|
|
|
|
|
|
LWLockRelease(NotifyQueueTailLock);
|
2010-02-16 23:34:57 +01:00
|
|
|
}
|
|
|
|
|
1998-10-06 04:40:09 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* ProcessIncomingNotify
|
1998-10-06 04:40:09 +02:00
|
|
|
*
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
* Scan the queue for arriving notifications and report them to the front
|
|
|
|
* end. The notifications might be from other sessions, or our own;
|
|
|
|
* there's no need to distinguish here.
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
*
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
* If "flush" is true, force any frontend messages out immediately.
|
1998-10-06 04:40:09 +02:00
|
|
|
*
|
|
|
|
* NOTE: since we are outside any transaction, we must create our own.
|
|
|
|
*/
|
|
|
|
static void
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
ProcessIncomingNotify(bool flush)
|
1998-10-06 04:40:09 +02:00
|
|
|
{
|
2010-09-23 23:16:51 +02:00
|
|
|
/* We *must* reset the flag */
|
Introduce and use infrastructure for interrupt processing during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
2015-02-03 22:25:20 +01:00
|
|
|
notifyInterruptPending = false;
|
2010-09-23 23:16:51 +02:00
|
|
|
|
|
|
|
/* Do nothing else if we aren't actively listening */
|
2010-02-16 23:34:57 +01:00
|
|
|
if (listenChannels == NIL)
|
|
|
|
return;
|
|
|
|
|
2000-05-31 02:28:42 +02:00
|
|
|
if (Trace_notify)
|
2003-05-27 19:49:47 +02:00
|
|
|
elog(DEBUG1, "ProcessIncomingNotify");
|
2000-05-31 02:28:42 +02:00
|
|
|
|
2020-03-11 16:36:40 +01:00
|
|
|
set_ps_display("notify interrupt");
|
1998-10-06 04:40:09 +02:00
|
|
|
|
|
|
|
/*
|
2010-02-16 23:34:57 +01:00
|
|
|
* We must run asyncQueueReadAllNotifications inside a transaction, else
|
|
|
|
* bad things happen if it gets an error.
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
2010-02-16 23:34:57 +01:00
|
|
|
StartTransactionCommand();
|
|
|
|
|
|
|
|
asyncQueueReadAllNotifications();
|
1998-08-30 23:05:27 +02:00
|
|
|
|
2003-05-14 05:26:03 +02:00
|
|
|
CommitTransactionCommand();
|
1998-10-06 04:40:09 +02:00
|
|
|
|
|
|
|
/*
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
* If this isn't an end-of-command case, we must flush the notify messages
|
|
|
|
* to ensure frontend gets them promptly.
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
if (flush)
|
|
|
|
pq_flush();
|
1998-10-06 04:40:09 +02:00
|
|
|
|
2020-03-11 16:36:40 +01:00
|
|
|
set_ps_display("idle");
|
2000-05-31 02:28:42 +02:00
|
|
|
|
|
|
|
if (Trace_notify)
|
2003-05-27 19:49:47 +02:00
|
|
|
elog(DEBUG1, "ProcessIncomingNotify: done");
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2001-06-18 00:27:15 +02:00
|
|
|
/*
|
|
|
|
* Send NOTIFY message to my front end.
|
|
|
|
*/
|
Fix several mistakes around parallel workers and client_encoding.
Previously, workers sent data to the leader using the client encoding.
That mostly worked, but the leader the converted the data back to the
server encoding. Since not all encoding conversions are reversible,
that could provoke failures. Fix by using the database encoding for
all communication between worker and leader.
Also, while temporary changes to GUC settings, as from the SET clause
of a function, are in general OK for parallel query, changing
client_encoding this way inside of a parallel worker is not OK.
Previously, that would have confused the leader; with these changes,
it would not confuse the leader, but it wouldn't do anything either.
So refuse such changes in parallel workers.
Also, the previous code naively assumed that when it received a
NotifyResonse from the worker, it could pass that directly back to the
user. But now that worker-to-leader communication always uses the
database encoding, that's clearly no longer correct - though,
actually, the old way was always broken for V2 clients. So
disassemble and reconstitute the message instead.
Issues reported by Peter Eisentraut. Patch by me, reviewed by
Peter Eisentraut.
2016-07-01 00:35:32 +02:00
|
|
|
void
|
2010-02-16 23:34:57 +01:00
|
|
|
NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid)
|
1998-10-06 04:40:09 +02:00
|
|
|
{
|
2005-11-03 18:11:40 +01:00
|
|
|
if (whereToSendOutput == DestRemote)
|
1998-10-06 04:40:09 +02:00
|
|
|
{
|
1999-04-25 05:19:27 +02:00
|
|
|
StringInfoData buf;
|
1999-05-25 18:15:34 +02:00
|
|
|
|
2003-04-22 02:08:07 +02:00
|
|
|
pq_beginmessage(&buf, 'A');
|
2017-10-12 06:00:46 +02:00
|
|
|
pq_sendint32(&buf, srcPid);
|
2010-02-16 23:34:57 +01:00
|
|
|
pq_sendstring(&buf, channel);
|
2021-03-04 09:45:55 +01:00
|
|
|
pq_sendstring(&buf, payload);
|
1999-04-25 05:19:27 +02:00
|
|
|
pq_endmessage(&buf);
|
1999-05-25 18:15:34 +02:00
|
|
|
|
1998-10-06 04:40:09 +02:00
|
|
|
/*
|
Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 23:18:25 +02:00
|
|
|
* NOTE: we do not do pq_flush() here. Some level of caller will
|
|
|
|
* handle it later, allowing this message to be combined into a packet
|
|
|
|
* with other ones.
|
1998-10-06 04:40:09 +02:00
|
|
|
*/
|
|
|
|
}
|
|
|
|
else
|
2010-02-16 23:34:57 +01:00
|
|
|
elog(INFO, "NOTIFY for \"%s\" payload \"%s\"", channel, payload);
|
1998-10-06 04:40:09 +02:00
|
|
|
}
|
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
/* Does pendingNotifies include a match for the given event? */
|
2001-06-18 00:27:15 +02:00
|
|
|
static bool
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
AsyncExistsPendingNotify(Notification *n)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
if (pendingNotifies == NULL)
|
2010-02-16 23:34:57 +01:00
|
|
|
return false;
|
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
if (pendingNotifies->hashtab != NULL)
|
|
|
|
{
|
|
|
|
/* Use the hash table to probe for a match */
|
|
|
|
if (hash_search(pendingNotifies->hashtab,
|
|
|
|
&n,
|
|
|
|
HASH_FIND,
|
|
|
|
NULL))
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Must scan the event list */
|
|
|
|
ListCell *l;
|
2010-02-16 23:34:57 +01:00
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
foreach(l, pendingNotifies->events)
|
|
|
|
{
|
|
|
|
Notification *oldn = (Notification *) lfirst(l);
|
|
|
|
|
|
|
|
if (n->channel_len == oldn->channel_len &&
|
|
|
|
n->payload_len == oldn->payload_len &&
|
|
|
|
memcmp(n->data, oldn->data,
|
|
|
|
n->channel_len + n->payload_len + 2) == 0)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add a notification event to a pre-existing pendingNotifies list.
|
|
|
|
*
|
|
|
|
* Because pendingNotifies->events is already nonempty, this works
|
|
|
|
* correctly no matter what CurrentMemoryContext is.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
AddEventToPendingNotifies(Notification *n)
|
|
|
|
{
|
|
|
|
Assert(pendingNotifies->events != NIL);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
/* Create the hash table if it's time to */
|
|
|
|
if (list_length(pendingNotifies->events) >= MIN_HASHABLE_NOTIFIES &&
|
|
|
|
pendingNotifies->hashtab == NULL)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
HASHCTL hash_ctl;
|
|
|
|
ListCell *l;
|
|
|
|
|
|
|
|
/* Create the hash table */
|
|
|
|
hash_ctl.keysize = sizeof(Notification *);
|
|
|
|
hash_ctl.entrysize = sizeof(NotificationHash);
|
|
|
|
hash_ctl.hash = notification_hash;
|
|
|
|
hash_ctl.match = notification_match;
|
|
|
|
hash_ctl.hcxt = CurTransactionContext;
|
|
|
|
pendingNotifies->hashtab =
|
|
|
|
hash_create("Pending Notifies",
|
|
|
|
256L,
|
|
|
|
&hash_ctl,
|
|
|
|
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
|
|
|
|
|
|
|
|
/* Insert all the already-existing events */
|
|
|
|
foreach(l, pendingNotifies->events)
|
|
|
|
{
|
|
|
|
Notification *oldn = (Notification *) lfirst(l);
|
|
|
|
NotificationHash *hentry;
|
|
|
|
bool found;
|
|
|
|
|
|
|
|
hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab,
|
|
|
|
&oldn,
|
|
|
|
HASH_ENTER,
|
|
|
|
&found);
|
|
|
|
Assert(!found);
|
|
|
|
hentry->event = oldn;
|
|
|
|
}
|
|
|
|
}
|
2005-06-18 00:32:51 +02:00
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
/* Add new event to the list, in order */
|
|
|
|
pendingNotifies->events = lappend(pendingNotifies->events, n);
|
|
|
|
|
|
|
|
/* Add event to the hash table if needed */
|
|
|
|
if (pendingNotifies->hashtab != NULL)
|
|
|
|
{
|
|
|
|
NotificationHash *hentry;
|
|
|
|
bool found;
|
|
|
|
|
|
|
|
hentry = (NotificationHash *) hash_search(pendingNotifies->hashtab,
|
|
|
|
&n,
|
|
|
|
HASH_ENTER,
|
|
|
|
&found);
|
|
|
|
Assert(!found);
|
|
|
|
hentry->event = n;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
/*
|
|
|
|
* notification_hash: hash function for notification hash table
|
|
|
|
*
|
|
|
|
* The hash "keys" are pointers to Notification structs.
|
|
|
|
*/
|
|
|
|
static uint32
|
|
|
|
notification_hash(const void *key, Size keysize)
|
|
|
|
{
|
|
|
|
const Notification *k = *(const Notification *const *) key;
|
|
|
|
|
|
|
|
Assert(keysize == sizeof(Notification *));
|
|
|
|
/* We don't bother to include the payload's trailing null in the hash */
|
|
|
|
return DatumGetUInt32(hash_any((const unsigned char *) k->data,
|
|
|
|
k->channel_len + k->payload_len + 1));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* notification_match: match function to use with notification_hash
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
notification_match(const void *key1, const void *key2, Size keysize)
|
|
|
|
{
|
|
|
|
const Notification *k1 = *(const Notification *const *) key1;
|
|
|
|
const Notification *k2 = *(const Notification *const *) key2;
|
|
|
|
|
|
|
|
Assert(keysize == sizeof(Notification *));
|
|
|
|
if (k1->channel_len == k2->channel_len &&
|
|
|
|
k1->payload_len == k2->payload_len &&
|
|
|
|
memcmp(k1->data, k2->data,
|
|
|
|
k1->channel_len + k1->payload_len + 2) == 0)
|
|
|
|
return 0; /* equal */
|
|
|
|
return 1; /* not equal */
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
/* Clear the pendingActions and pendingNotifies lists. */
|
1996-07-09 08:22:35 +02:00
|
|
|
static void
|
Fix LISTEN/NOTIFY race condition reported by Laurent Birtz, by postponing
pg_listener modifications commanded by LISTEN and UNLISTEN until the end
of the current transaction. This allows us to hold the ExclusiveLock on
pg_listener until after commit, with no greater risk of deadlock than there
was before. Aside from fixing the race condition, this gets rid of a
truly ugly kludge that was there before, namely having to ignore
HeapTupleBeingUpdated failures during NOTIFY. There is a small potential
incompatibility, which is that if a transaction issues LISTEN or UNLISTEN
and then looks into pg_listener before committing, it won't see any resulting
row insertion or deletion, where before it would have. It seems unlikely
that anyone would be depending on that, though.
This patch also disallows LISTEN and UNLISTEN inside a prepared transaction.
That case had some pretty undesirable properties already, such as possibly
allowing pg_listener entries to be made for PIDs no longer present, so
disallowing it seems like a better idea than trying to maintain the behavior.
2008-03-12 21:11:46 +01:00
|
|
|
ClearPendingActionsAndNotifies(void)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-06-18 00:27:15 +02:00
|
|
|
/*
|
2019-10-04 14:19:25 +02:00
|
|
|
* Everything's allocated in either TopTransactionContext or the context
|
Stabilize the results of pg_notification_queue_usage().
This function wasn't touched in commit 51004c717, but that turns out
to be a bad idea, because its results now include any dead space
that exists in the NOTIFY queue on account of our being lazy about
advancing the queue tail. Notably, the isolation tests now fail
if run twice without a server restart between, because async-notify's
first test of the function will already show a positive value.
It seems likely that end users would be equally unhappy about the
result's instability. To fix, just make the function call
asyncQueueAdvanceTail before computing its result. That should end
in producing the same value as before, and it's hard to believe that
there's any practical use-case where pg_notification_queue_usage()
is called so often as to create a performance degradation, especially
compared to what we did before.
Out of paranoia, also mark this function parallel-restricted (it
was volatile, but parallel-safe by default, before). Although the
code seems to work fine when run in a parallel worker, that's outside
the design scope of async.c, and it's a bit scary to have intentional
side-effects happening in a parallel worker. There seems no plausible
use-case where it'd be important to try to parallelize this, so let's
not take any risk of introducing new bugs.
In passing, re-pgindent async.c and run reformat-dat-files on
pg_proc.dat, just because I'm a neatnik.
Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 20:09:33 +01:00
|
|
|
* for the subtransaction to which it corresponds. So, there's nothing to
|
|
|
|
* do here except reset the pointers; the space will be reclaimed when the
|
|
|
|
* contexts are deleted.
|
2001-06-18 00:27:15 +02:00
|
|
|
*/
|
2019-10-04 14:19:25 +02:00
|
|
|
pendingActions = NULL;
|
Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event. This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events. We can improve matters by
using a hash table to probe for duplicates. So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.
A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)
We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions. Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.
While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three. This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.
Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.
Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 18:22:12 +02:00
|
|
|
pendingNotifies = NULL;
|
1997-09-07 07:04:48 +02:00
|
|
|
}
|