2001-09-29 06:02:27 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* lwlock.c
|
|
|
|
* Lightweight lock manager
|
|
|
|
*
|
|
|
|
* Lightweight locks are intended primarily to provide mutual exclusion of
|
|
|
|
* access to shared-memory data structures. Therefore, they offer both
|
|
|
|
* exclusive and shared lock modes (to support read/write and read-only
|
|
|
|
* access to a shared object). There are few other frammishes. User-level
|
|
|
|
* locking should be done with the full lock manager --- which depends on
|
2005-12-11 22:02:18 +01:00
|
|
|
* LWLocks to protect its shared state.
|
2001-09-29 06:02:27 +02:00
|
|
|
*
|
2015-07-31 20:20:43 +02:00
|
|
|
* In addition to exclusive and shared modes, lightweight locks can be used to
|
|
|
|
* wait until a variable changes value. The variable is initially not set
|
|
|
|
* when the lock is acquired with LWLockAcquire, i.e. it remains set to the
|
|
|
|
* value it was set to when the lock was released last, and can be updated
|
2014-03-21 15:06:08 +01:00
|
|
|
* without releasing the lock by calling LWLockUpdateVar. LWLockWaitForVar
|
2015-07-31 20:20:43 +02:00
|
|
|
* waits for the variable to be updated, or until the lock is free. When
|
|
|
|
* releasing the lock with LWLockReleaseClearVar() the value can be set to an
|
|
|
|
* appropriate value for a free lock. The meaning of the variable is up to
|
|
|
|
* the caller, the lightweight lock code just assigns and compares it.
|
2001-09-29 06:02:27 +02:00
|
|
|
*
|
2023-01-02 21:00:37 +01:00
|
|
|
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
|
2001-09-29 06:02:27 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/storage/lmgr/lwlock.c
|
2001-09-29 06:02:27 +02:00
|
|
|
*
|
2014-12-25 17:24:30 +01:00
|
|
|
* NOTES:
|
|
|
|
*
|
|
|
|
* This used to be a pretty straight forward reader-writer lock
|
|
|
|
* implementation, in which the internal state was protected by a
|
|
|
|
* spinlock. Unfortunately the overhead of taking the spinlock proved to be
|
|
|
|
* too high for workloads/locks that were taken in shared mode very
|
|
|
|
* frequently. Often we were spinning in the (obviously exclusive) spinlock,
|
|
|
|
* while trying to acquire a shared lock that was actually free.
|
|
|
|
*
|
|
|
|
* Thus a new implementation was devised that provides wait-free shared lock
|
|
|
|
* acquisition for locks that aren't exclusively locked.
|
|
|
|
*
|
|
|
|
* The basic idea is to have a single atomic variable 'lockcount' instead of
|
|
|
|
* the formerly separate shared and exclusive counters and to use atomic
|
|
|
|
* operations to acquire the lock. That's fairly easy to do for plain
|
|
|
|
* rw-spinlocks, but a lot harder for something like LWLocks that want to wait
|
|
|
|
* in the OS.
|
|
|
|
*
|
|
|
|
* For lock acquisition we use an atomic compare-and-exchange on the lockcount
|
|
|
|
* variable. For exclusive lock we swap in a sentinel value
|
|
|
|
* (LW_VAL_EXCLUSIVE), for shared locks we count the number of holders.
|
|
|
|
*
|
|
|
|
* To release the lock we use an atomic decrement to release the lock. If the
|
|
|
|
* new value is zero (we get that atomically), we know we can/have to release
|
|
|
|
* waiters.
|
|
|
|
*
|
|
|
|
* Obviously it is important that the sentinel value for exclusive locks
|
|
|
|
* doesn't conflict with the maximum number of possible share lockers -
|
|
|
|
* luckily MAX_BACKENDS makes that easily possible.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* The attentive reader might have noticed that naively doing the above has a
|
|
|
|
* glaring race condition: We try to lock using the atomic operations and
|
|
|
|
* notice that we have to wait. Unfortunately by the time we have finished
|
|
|
|
* queuing, the former locker very well might have already finished it's
|
|
|
|
* work. That's problematic because we're now stuck waiting inside the OS.
|
|
|
|
|
|
|
|
* To mitigate those races we use a two phased attempt at locking:
|
|
|
|
* Phase 1: Try to do it atomically, if we succeed, nice
|
|
|
|
* Phase 2: Add ourselves to the waitqueue of the lock
|
|
|
|
* Phase 3: Try to grab the lock again, if we succeed, remove ourselves from
|
|
|
|
* the queue
|
|
|
|
* Phase 4: Sleep till wake-up, goto Phase 1
|
|
|
|
*
|
|
|
|
* This protects us against the problem from above as nobody can release too
|
|
|
|
* quick, before we're queued, since after Phase 2 we're already queued.
|
|
|
|
* -------------------------------------------------------------------------
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2005-12-29 19:08:05 +01:00
|
|
|
#include "miscadmin.h"
|
2008-08-01 15:16:09 +02:00
|
|
|
#include "pg_trace.h"
|
2019-11-12 04:00:16 +01:00
|
|
|
#include "pgstat.h"
|
2021-07-01 05:29:06 +02:00
|
|
|
#include "port/pg_bitutils.h"
|
2014-12-25 17:24:30 +01:00
|
|
|
#include "postmaster/postmaster.h"
|
2014-02-01 04:45:17 +01:00
|
|
|
#include "replication/slot.h"
|
2006-08-07 23:56:25 +02:00
|
|
|
#include "storage/ipc.h"
|
Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
|
|
|
#include "storage/predicate.h"
|
2011-09-04 07:13:16 +02:00
|
|
|
#include "storage/proc.h"
|
2016-08-16 00:09:55 +02:00
|
|
|
#include "storage/proclist.h"
|
2011-09-04 07:13:16 +02:00
|
|
|
#include "storage/spin.h"
|
2014-01-27 17:07:44 +01:00
|
|
|
#include "utils/memutils.h"
|
|
|
|
|
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
#include "utils/hsearch.h"
|
|
|
|
#endif
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
|
2016-02-10 15:58:09 +01:00
|
|
|
/* We use the ShmemLock spinlock to protect LWLockCounter */
|
2005-10-07 23:42:38 +02:00
|
|
|
extern slock_t *ShmemLock;
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
#define LW_FLAG_HAS_WAITERS ((uint32) 1 << 30)
|
|
|
|
#define LW_FLAG_RELEASE_OK ((uint32) 1 << 29)
|
2016-04-11 05:12:32 +02:00
|
|
|
#define LW_FLAG_LOCKED ((uint32) 1 << 28)
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
#define LW_VAL_EXCLUSIVE ((uint32) 1 << 24)
|
|
|
|
#define LW_VAL_SHARED 1
|
|
|
|
|
|
|
|
#define LW_LOCK_MASK ((uint32) ((1 << 25)-1))
|
|
|
|
/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
|
2015-09-22 11:05:48 +02:00
|
|
|
#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2022-12-08 14:30:01 +01:00
|
|
|
StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
|
|
|
|
"MAX_BACKENDS too big for lwlock.c");
|
|
|
|
|
2005-09-16 02:30:05 +02:00
|
|
|
/*
|
2020-05-14 17:10:31 +02:00
|
|
|
* There are three sorts of LWLock "tranches":
|
|
|
|
*
|
|
|
|
* 1. The individually-named locks defined in lwlocknames.h each have their
|
2020-05-16 01:55:56 +02:00
|
|
|
* own tranche. The names of these tranches appear in IndividualLWLockNames[]
|
2020-05-14 17:10:31 +02:00
|
|
|
* in lwlocknames.c.
|
|
|
|
*
|
|
|
|
* 2. There are some predefined tranches for built-in groups of locks.
|
|
|
|
* These are listed in enum BuiltinTrancheIds in lwlock.h, and their names
|
|
|
|
* appear in BuiltinTrancheNames[] below.
|
|
|
|
*
|
|
|
|
* 3. Extensions can create new tranches, via either RequestNamedLWLockTranche
|
|
|
|
* or LWLockRegisterTranche. The names of these that are known in the current
|
|
|
|
* process appear in LWLockTrancheNames[].
|
2020-05-16 00:11:03 +02:00
|
|
|
*
|
|
|
|
* All these names are user-visible as wait event names, so choose with care
|
|
|
|
* ... and do not forget to update the documentation's list of wait events.
|
2005-09-16 02:30:05 +02:00
|
|
|
*/
|
2020-05-16 01:55:56 +02:00
|
|
|
extern const char *const IndividualLWLockNames[]; /* in lwlocknames.c */
|
2005-09-16 02:30:05 +02:00
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
static const char *const BuiltinTrancheNames[] = {
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
/* LWTRANCHE_XACT_BUFFER: */
|
|
|
|
"XactBuffer",
|
|
|
|
/* LWTRANCHE_COMMITTS_BUFFER: */
|
2023-05-05 14:25:44 +02:00
|
|
|
"CommitTsBuffer",
|
Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 20:28:19 +02:00
|
|
|
/* LWTRANCHE_SUBTRANS_BUFFER: */
|
|
|
|
"SubtransBuffer",
|
|
|
|
/* LWTRANCHE_MULTIXACTOFFSET_BUFFER: */
|
|
|
|
"MultiXactOffsetBuffer",
|
|
|
|
/* LWTRANCHE_MULTIXACTMEMBER_BUFFER: */
|
|
|
|
"MultiXactMemberBuffer",
|
|
|
|
/* LWTRANCHE_NOTIFY_BUFFER: */
|
|
|
|
"NotifyBuffer",
|
|
|
|
/* LWTRANCHE_SERIAL_BUFFER: */
|
|
|
|
"SerialBuffer",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_WAL_INSERT: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"WALInsert",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_BUFFER_CONTENT: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"BufferContent",
|
|
|
|
/* LWTRANCHE_REPLICATION_ORIGIN_STATE: */
|
|
|
|
"ReplicationOriginState",
|
|
|
|
/* LWTRANCHE_REPLICATION_SLOT_IO: */
|
|
|
|
"ReplicationSlotIO",
|
|
|
|
/* LWTRANCHE_LOCK_FASTPATH: */
|
|
|
|
"LockFastPath",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_BUFFER_MAPPING: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"BufferMapping",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_LOCK_MANAGER: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"LockManager",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_PREDICATE_LOCK_MANAGER: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"PredicateLockManager",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_PARALLEL_HASH_JOIN: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"ParallelHashJoin",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_PARALLEL_QUERY_DSA: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"ParallelQueryDSA",
|
|
|
|
/* LWTRANCHE_PER_SESSION_DSA: */
|
|
|
|
"PerSessionDSA",
|
|
|
|
/* LWTRANCHE_PER_SESSION_RECORD_TYPE: */
|
|
|
|
"PerSessionRecordType",
|
|
|
|
/* LWTRANCHE_PER_SESSION_RECORD_TYPMOD: */
|
|
|
|
"PerSessionRecordTypmod",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_SHARED_TUPLESTORE: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"SharedTupleStore",
|
|
|
|
/* LWTRANCHE_SHARED_TIDBITMAP: */
|
|
|
|
"SharedTidBitmap",
|
2020-05-14 17:10:31 +02:00
|
|
|
/* LWTRANCHE_PARALLEL_APPEND: */
|
2020-05-16 00:11:03 +02:00
|
|
|
"ParallelAppend",
|
|
|
|
/* LWTRANCHE_PER_XACT_PREDICATE_LIST: */
|
2022-04-07 06:29:46 +02:00
|
|
|
"PerXactPredicateList",
|
|
|
|
/* LWTRANCHE_PGSTATS_DSA: */
|
|
|
|
"PgStatsDSA",
|
|
|
|
/* LWTRANCHE_PGSTATS_HASH: */
|
|
|
|
"PgStatsHash",
|
|
|
|
/* LWTRANCHE_PGSTATS_DATA: */
|
|
|
|
"PgStatsData",
|
2023-01-22 20:08:46 +01:00
|
|
|
/* LWTRANCHE_LAUNCHER_DSA: */
|
|
|
|
"LogicalRepLauncherDSA",
|
|
|
|
/* LWTRANCHE_LAUNCHER_HASH: */
|
|
|
|
"LogicalRepLauncherHash",
|
2020-05-14 17:10:31 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
|
|
|
|
LWTRANCHE_FIRST_USER_DEFINED - NUM_INDIVIDUAL_LWLOCKS,
|
|
|
|
"missing entries in BuiltinTrancheNames[]");
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is indexed by tranche ID minus LWTRANCHE_FIRST_USER_DEFINED, and
|
|
|
|
* stores the names of all dynamically-created tranches known to the current
|
|
|
|
* process. Any unused entries in the array will contain NULL.
|
|
|
|
*/
|
|
|
|
static const char **LWLockTrancheNames = NULL;
|
|
|
|
static int LWLockTrancheNamesAllocated = 0;
|
2005-09-16 02:30:05 +02:00
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
/*
|
2014-01-27 17:07:44 +01:00
|
|
|
* This points to the main array of LWLocks in shared memory. Backends inherit
|
2005-09-16 02:30:05 +02:00
|
|
|
* the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
|
|
|
|
* where we have special measures to pass it down).
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
2014-01-27 17:07:44 +01:00
|
|
|
LWLockPadded *MainLWLockArray = NULL;
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We use this structure to keep track of locked LWLocks for release
|
2014-10-02 19:58:50 +02:00
|
|
|
* during error recovery. Normally, only a few will be held at once, but
|
|
|
|
* occasionally the number can be much higher; for example, the pg_buffercache
|
|
|
|
* extension locks all buffer partitions simultaneously.
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
2014-10-02 19:58:50 +02:00
|
|
|
#define MAX_SIMUL_LWLOCKS 200
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/* struct representing the LWLocks we're holding */
|
|
|
|
typedef struct LWLockHandle
|
|
|
|
{
|
|
|
|
LWLock *lock;
|
|
|
|
LWLockMode mode;
|
|
|
|
} LWLockHandle;
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
static int num_held_lwlocks = 0;
|
2014-12-25 17:24:30 +01:00
|
|
|
static LWLockHandle held_lwlocks[MAX_SIMUL_LWLOCKS];
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2016-02-04 22:43:04 +01:00
|
|
|
/* struct representing the LWLock tranche request for named tranche */
|
|
|
|
typedef struct NamedLWLockTrancheRequest
|
|
|
|
{
|
|
|
|
char tranche_name[NAMEDATALEN];
|
|
|
|
int num_lwlocks;
|
|
|
|
} NamedLWLockTrancheRequest;
|
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
static NamedLWLockTrancheRequest *NamedLWLockTrancheRequestArray = NULL;
|
2016-02-04 22:43:04 +01:00
|
|
|
static int NamedLWLockTrancheRequestsAllocated = 0;
|
2020-05-14 17:10:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* NamedLWLockTrancheRequests is both the valid length of the request array,
|
|
|
|
* and the length of the shared-memory NamedLWLockTrancheArray later on.
|
|
|
|
* This variable and NamedLWLockTrancheArray are non-static so that
|
|
|
|
* postmaster.c can copy them to child processes in EXEC_BACKEND builds.
|
|
|
|
*/
|
2016-02-04 22:43:04 +01:00
|
|
|
int NamedLWLockTrancheRequests = 0;
|
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
/* points to data in shared memory: */
|
2016-02-04 22:43:04 +01:00
|
|
|
NamedLWLockTranche *NamedLWLockTrancheArray = NULL;
|
|
|
|
|
2016-02-11 20:07:33 +01:00
|
|
|
static void InitializeLWLocks(void);
|
2016-03-10 18:44:09 +01:00
|
|
|
static inline void LWLockReportWaitStart(LWLock *lock);
|
2016-03-27 22:53:31 +02:00
|
|
|
static inline void LWLockReportWaitEnd(void);
|
2020-05-14 17:10:31 +02:00
|
|
|
static const char *GetLWTrancheName(uint16 trancheId);
|
|
|
|
|
|
|
|
#define T_NAME(lock) \
|
|
|
|
GetLWTrancheName((lock)->tranche)
|
2016-03-10 18:44:09 +01:00
|
|
|
|
2006-04-21 18:45:12 +02:00
|
|
|
#ifdef LWLOCK_STATS
|
2014-01-27 17:07:44 +01:00
|
|
|
typedef struct lwlock_stats_key
|
|
|
|
{
|
|
|
|
int tranche;
|
2016-12-16 17:29:23 +01:00
|
|
|
void *instance;
|
2014-01-27 17:07:44 +01:00
|
|
|
} lwlock_stats_key;
|
|
|
|
|
|
|
|
typedef struct lwlock_stats
|
|
|
|
{
|
|
|
|
lwlock_stats_key key;
|
|
|
|
int sh_acquire_count;
|
|
|
|
int ex_acquire_count;
|
|
|
|
int block_count;
|
2014-12-25 17:24:30 +01:00
|
|
|
int dequeue_self_count;
|
2014-01-27 17:07:44 +01:00
|
|
|
int spin_delay_count;
|
|
|
|
} lwlock_stats;
|
|
|
|
|
|
|
|
static HTAB *lwlock_stats_htab;
|
2014-06-30 09:13:48 +02:00
|
|
|
static lwlock_stats lwlock_stats_dummy;
|
2006-04-21 18:45:12 +02:00
|
|
|
#endif
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
bool Trace_lwlocks = false;
|
|
|
|
|
|
|
|
inline static void
|
2014-12-25 17:24:30 +01:00
|
|
|
PRINT_LWDEBUG(const char *where, LWLock *lock, LWLockMode mode)
|
2001-09-29 06:02:27 +02:00
|
|
|
{
|
2014-12-25 17:24:30 +01:00
|
|
|
/* hide statement & context here, otherwise the log is just too verbose */
|
2001-09-29 06:02:27 +02:00
|
|
|
if (Trace_lwlocks)
|
2014-12-25 17:24:30 +01:00
|
|
|
{
|
|
|
|
uint32 state = pg_atomic_read_u32(&lock->state);
|
2016-12-16 17:29:23 +01:00
|
|
|
|
|
|
|
ereport(LOG,
|
|
|
|
(errhidestmt(true),
|
|
|
|
errhidecontext(true),
|
|
|
|
errmsg_internal("%d: %s(%s %p): excl %u shared %u haswaiters %u waiters %u rOK %d",
|
|
|
|
MyProcPid,
|
|
|
|
where, T_NAME(lock), lock,
|
|
|
|
(state & LW_VAL_EXCLUSIVE) != 0,
|
|
|
|
state & LW_SHARED_MASK,
|
|
|
|
(state & LW_FLAG_HAS_WAITERS) != 0,
|
|
|
|
pg_atomic_read_u32(&lock->nwaiters),
|
|
|
|
(state & LW_FLAG_RELEASE_OK) != 0)));
|
2014-12-25 17:24:30 +01:00
|
|
|
}
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
|
|
|
|
2001-12-29 00:26:04 +01:00
|
|
|
inline static void
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG(const char *where, LWLock *lock, const char *msg)
|
2001-12-29 00:26:04 +01:00
|
|
|
{
|
2014-12-25 17:24:30 +01:00
|
|
|
/* hide statement & context here, otherwise the log is just too verbose */
|
2001-12-29 00:26:04 +01:00
|
|
|
if (Trace_lwlocks)
|
2014-12-25 17:24:30 +01:00
|
|
|
{
|
2016-12-16 17:29:23 +01:00
|
|
|
ereport(LOG,
|
|
|
|
(errhidestmt(true),
|
|
|
|
errhidecontext(true),
|
|
|
|
errmsg_internal("%s(%s %p): %s", where,
|
|
|
|
T_NAME(lock), lock, msg)));
|
2014-12-25 17:24:30 +01:00
|
|
|
}
|
2001-12-29 00:26:04 +01:00
|
|
|
}
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
#else /* not LOCK_DEBUG */
|
2014-12-25 17:24:30 +01:00
|
|
|
#define PRINT_LWDEBUG(a,b,c) ((void)0)
|
|
|
|
#define LOG_LWDEBUG(a,b,c) ((void)0)
|
2001-09-29 06:02:27 +02:00
|
|
|
#endif /* LOCK_DEBUG */
|
|
|
|
|
2006-04-21 18:45:12 +02:00
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
|
2012-02-07 08:38:25 +01:00
|
|
|
static void init_lwlock_stats(void);
|
|
|
|
static void print_lwlock_stats(int code, Datum arg);
|
2019-07-29 05:28:30 +02:00
|
|
|
static lwlock_stats * get_lwlock_stats_entry(LWLock *lock);
|
2012-02-07 08:38:25 +01:00
|
|
|
|
|
|
|
static void
|
|
|
|
init_lwlock_stats(void)
|
|
|
|
{
|
2014-01-27 17:07:44 +01:00
|
|
|
HASHCTL ctl;
|
2014-06-30 09:13:48 +02:00
|
|
|
static MemoryContext lwlock_stats_cxt = NULL;
|
|
|
|
static bool exit_registered = false;
|
2014-01-27 17:07:44 +01:00
|
|
|
|
2014-06-30 09:13:48 +02:00
|
|
|
if (lwlock_stats_cxt != NULL)
|
|
|
|
MemoryContextDelete(lwlock_stats_cxt);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The LWLock stats will be updated within a critical section, which
|
|
|
|
* requires allocating new hash entries. Allocations within a critical
|
|
|
|
* section are normally not allowed because running out of memory would
|
|
|
|
* lead to a PANIC, but LWLOCK_STATS is debugging code that's not normally
|
|
|
|
* turned on in production, so that's an acceptable risk. The hash entries
|
|
|
|
* are small, so the risk of running out of memory is minimal in practice.
|
|
|
|
*/
|
|
|
|
lwlock_stats_cxt = AllocSetContextCreate(TopMemoryContext,
|
|
|
|
"LWLock stats",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
2014-06-30 09:13:48 +02:00
|
|
|
MemoryContextAllowInCriticalSection(lwlock_stats_cxt, true);
|
2012-02-07 08:38:25 +01:00
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
ctl.keysize = sizeof(lwlock_stats_key);
|
|
|
|
ctl.entrysize = sizeof(lwlock_stats);
|
2014-06-30 09:13:48 +02:00
|
|
|
ctl.hcxt = lwlock_stats_cxt;
|
2014-01-27 17:07:44 +01:00
|
|
|
lwlock_stats_htab = hash_create("lwlock stats", 16384, &ctl,
|
Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create(). Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate. Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions. hash_create() itself will
take care of optimizing when the key size is four bytes.
This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.
In future we could look into offering a similar optimized hashing function
for 8-byte keys. Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.
For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules. Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.
Teodor Sigaev and Tom Lane
2014-12-18 19:36:29 +01:00
|
|
|
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
|
2014-06-30 09:13:48 +02:00
|
|
|
if (!exit_registered)
|
|
|
|
{
|
|
|
|
on_shmem_exit(print_lwlock_stats, 0);
|
|
|
|
exit_registered = true;
|
|
|
|
}
|
2012-02-07 08:38:25 +01:00
|
|
|
}
|
|
|
|
|
2006-04-21 18:45:12 +02:00
|
|
|
static void
|
|
|
|
print_lwlock_stats(int code, Datum arg)
|
|
|
|
{
|
2014-01-27 17:07:44 +01:00
|
|
|
HASH_SEQ_STATUS scan;
|
|
|
|
lwlock_stats *lwstats;
|
|
|
|
|
|
|
|
hash_seq_init(&scan, lwlock_stats_htab);
|
2006-04-21 18:45:12 +02:00
|
|
|
|
|
|
|
/* Grab an LWLock to keep different backends from mixing reports */
|
2014-01-27 17:07:44 +01:00
|
|
|
LWLockAcquire(&MainLWLockArray[0].lock, LW_EXCLUSIVE);
|
2006-04-21 18:45:12 +02:00
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
while ((lwstats = (lwlock_stats *) hash_seq_search(&scan)) != NULL)
|
2006-04-21 18:45:12 +02:00
|
|
|
{
|
2014-01-27 17:07:44 +01:00
|
|
|
fprintf(stderr,
|
2016-12-16 17:29:23 +01:00
|
|
|
"PID %d lwlock %s %p: shacq %u exacq %u blk %u spindelay %u dequeue self %u\n",
|
2020-05-14 17:10:31 +02:00
|
|
|
MyProcPid, GetLWTrancheName(lwstats->key.tranche),
|
2014-01-27 17:07:44 +01:00
|
|
|
lwstats->key.instance, lwstats->sh_acquire_count,
|
|
|
|
lwstats->ex_acquire_count, lwstats->block_count,
|
2014-12-25 17:24:30 +01:00
|
|
|
lwstats->spin_delay_count, lwstats->dequeue_self_count);
|
2006-04-21 18:45:12 +02:00
|
|
|
}
|
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
LWLockRelease(&MainLWLockArray[0].lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static lwlock_stats *
|
|
|
|
get_lwlock_stats_entry(LWLock *lock)
|
|
|
|
{
|
|
|
|
lwlock_stats_key key;
|
|
|
|
lwlock_stats *lwstats;
|
|
|
|
bool found;
|
|
|
|
|
2014-06-30 09:13:48 +02:00
|
|
|
/*
|
|
|
|
* During shared memory initialization, the hash table doesn't exist yet.
|
|
|
|
* Stats of that phase aren't very interesting, so just collect operations
|
|
|
|
* on all locks in a single dummy entry.
|
|
|
|
*/
|
|
|
|
if (lwlock_stats_htab == NULL)
|
|
|
|
return &lwlock_stats_dummy;
|
2014-01-27 17:07:44 +01:00
|
|
|
|
|
|
|
/* Fetch or create the entry. */
|
2020-02-06 06:43:21 +01:00
|
|
|
MemSet(&key, 0, sizeof(key));
|
2014-01-27 17:07:44 +01:00
|
|
|
key.tranche = lock->tranche;
|
2016-12-16 17:29:23 +01:00
|
|
|
key.instance = lock;
|
2014-01-27 17:07:44 +01:00
|
|
|
lwstats = hash_search(lwlock_stats_htab, &key, HASH_ENTER, &found);
|
|
|
|
if (!found)
|
|
|
|
{
|
|
|
|
lwstats->sh_acquire_count = 0;
|
|
|
|
lwstats->ex_acquire_count = 0;
|
|
|
|
lwstats->block_count = 0;
|
2014-12-25 17:24:30 +01:00
|
|
|
lwstats->dequeue_self_count = 0;
|
2014-01-27 17:07:44 +01:00
|
|
|
lwstats->spin_delay_count = 0;
|
|
|
|
}
|
|
|
|
return lwstats;
|
2006-04-21 18:45:12 +02:00
|
|
|
}
|
|
|
|
#endif /* LWLOCK_STATS */
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2016-02-04 22:43:04 +01:00
|
|
|
/*
|
|
|
|
* Compute number of LWLocks required by named tranches. These will be
|
|
|
|
* allocated in the main array.
|
|
|
|
*/
|
|
|
|
static int
|
2020-05-14 17:10:31 +02:00
|
|
|
NumLWLocksForNamedTranches(void)
|
2016-02-04 22:43:04 +01:00
|
|
|
{
|
|
|
|
int numLocks = 0;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < NamedLWLockTrancheRequests; i++)
|
|
|
|
numLocks += NamedLWLockTrancheRequestArray[i].num_lwlocks;
|
|
|
|
|
|
|
|
return numLocks;
|
|
|
|
}
|
|
|
|
|
2006-10-16 00:04:08 +02:00
|
|
|
/*
|
2016-02-04 22:43:04 +01:00
|
|
|
* Compute shmem space needed for LWLocks and named tranches.
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
2005-08-21 01:26:37 +02:00
|
|
|
Size
|
2001-09-29 06:02:27 +02:00
|
|
|
LWLockShmemSize(void)
|
|
|
|
{
|
2005-08-21 01:26:37 +02:00
|
|
|
Size size;
|
2016-02-04 22:43:04 +01:00
|
|
|
int i;
|
2016-02-10 15:58:09 +01:00
|
|
|
int numLocks = NUM_FIXED_LWLOCKS;
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
/* Calculate total number of locks needed in the main array. */
|
|
|
|
numLocks += NumLWLocksForNamedTranches();
|
2016-02-04 22:43:04 +01:00
|
|
|
|
2005-09-16 02:30:05 +02:00
|
|
|
/* Space for the LWLock array. */
|
|
|
|
size = mul_size(numLocks, sizeof(LWLockPadded));
|
2005-08-21 01:26:37 +02:00
|
|
|
|
2005-10-07 23:42:38 +02:00
|
|
|
/* Space for dynamic allocation counter, plus room for alignment. */
|
2016-02-10 15:58:09 +01:00
|
|
|
size = add_size(size, sizeof(int) + LWLOCK_PADDED_SIZE);
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2016-02-04 22:43:04 +01:00
|
|
|
/* space for named tranches. */
|
|
|
|
size = add_size(size, mul_size(NamedLWLockTrancheRequests, sizeof(NamedLWLockTranche)));
|
|
|
|
|
|
|
|
/* space for name of each tranche. */
|
|
|
|
for (i = 0; i < NamedLWLockTrancheRequests; i++)
|
|
|
|
size = add_size(size, strlen(NamedLWLockTrancheRequestArray[i].tranche_name) + 1);
|
|
|
|
|
2005-08-21 01:26:37 +02:00
|
|
|
return size;
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2016-02-11 20:07:33 +01:00
|
|
|
* Allocate shmem space for the main LWLock array and all tranches and
|
2020-05-14 17:10:31 +02:00
|
|
|
* initialize it. We also register extension LWLock tranches here.
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
CreateLWLocks(void)
|
|
|
|
{
|
2014-01-27 17:07:44 +01:00
|
|
|
if (!IsUnderPostmaster)
|
|
|
|
{
|
|
|
|
Size spaceLocks = LWLockShmemSize();
|
|
|
|
int *LWLockCounter;
|
|
|
|
char *ptr;
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
/* Allocate space */
|
|
|
|
ptr = (char *) ShmemAlloc(spaceLocks);
|
2005-09-16 02:30:05 +02:00
|
|
|
|
2016-02-10 15:58:09 +01:00
|
|
|
/* Leave room for dynamic allocation of tranches */
|
|
|
|
ptr += sizeof(int);
|
2005-10-07 23:42:38 +02:00
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
/* Ensure desired alignment of LWLock array */
|
|
|
|
ptr += LWLOCK_PADDED_SIZE - ((uintptr_t) ptr) % LWLOCK_PADDED_SIZE;
|
2005-09-16 02:30:05 +02:00
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
MainLWLockArray = (LWLockPadded *) ptr;
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
/*
|
2016-02-10 15:58:09 +01:00
|
|
|
* Initialize the dynamic-allocation counter for tranches, which is
|
|
|
|
* stored just before the first LWLock.
|
2014-01-27 17:07:44 +01:00
|
|
|
*/
|
2016-02-10 15:58:09 +01:00
|
|
|
LWLockCounter = (int *) ((char *) MainLWLockArray - sizeof(int));
|
|
|
|
*LWLockCounter = LWTRANCHE_FIRST_USER_DEFINED;
|
2016-02-04 22:43:04 +01:00
|
|
|
|
2016-02-11 20:07:33 +01:00
|
|
|
/* Initialize all LWLocks */
|
|
|
|
InitializeLWLocks();
|
|
|
|
}
|
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
/* Register named extension LWLock tranches in the current process. */
|
|
|
|
for (int i = 0; i < NamedLWLockTrancheRequests; i++)
|
|
|
|
LWLockRegisterTranche(NamedLWLockTrancheArray[i].trancheId,
|
|
|
|
NamedLWLockTrancheArray[i].trancheName);
|
2016-02-11 20:07:33 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize LWLocks that are fixed and those belonging to named tranches.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
InitializeLWLocks(void)
|
|
|
|
{
|
2020-05-14 17:10:31 +02:00
|
|
|
int numNamedLocks = NumLWLocksForNamedTranches();
|
2016-02-11 20:07:33 +01:00
|
|
|
int id;
|
|
|
|
int i;
|
|
|
|
int j;
|
|
|
|
LWLockPadded *lock;
|
|
|
|
|
|
|
|
/* Initialize all individual LWLocks in main array */
|
|
|
|
for (id = 0, lock = MainLWLockArray; id < NUM_INDIVIDUAL_LWLOCKS; id++, lock++)
|
2016-12-16 17:29:23 +01:00
|
|
|
LWLockInitialize(&lock->lock, id);
|
2016-02-11 20:07:33 +01:00
|
|
|
|
|
|
|
/* Initialize buffer mapping LWLocks in main array */
|
2020-11-24 04:39:58 +01:00
|
|
|
lock = MainLWLockArray + BUFFER_MAPPING_LWLOCK_OFFSET;
|
2016-02-11 20:07:33 +01:00
|
|
|
for (id = 0; id < NUM_BUFFER_PARTITIONS; id++, lock++)
|
|
|
|
LWLockInitialize(&lock->lock, LWTRANCHE_BUFFER_MAPPING);
|
|
|
|
|
|
|
|
/* Initialize lmgrs' LWLocks in main array */
|
2020-11-24 04:39:58 +01:00
|
|
|
lock = MainLWLockArray + LOCK_MANAGER_LWLOCK_OFFSET;
|
2016-02-11 20:07:33 +01:00
|
|
|
for (id = 0; id < NUM_LOCK_PARTITIONS; id++, lock++)
|
|
|
|
LWLockInitialize(&lock->lock, LWTRANCHE_LOCK_MANAGER);
|
|
|
|
|
|
|
|
/* Initialize predicate lmgrs' LWLocks in main array */
|
2020-11-24 04:39:58 +01:00
|
|
|
lock = MainLWLockArray + PREDICATELOCK_MANAGER_LWLOCK_OFFSET;
|
2016-02-11 20:07:33 +01:00
|
|
|
for (id = 0; id < NUM_PREDICATELOCK_PARTITIONS; id++, lock++)
|
|
|
|
LWLockInitialize(&lock->lock, LWTRANCHE_PREDICATE_LOCK_MANAGER);
|
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
/*
|
|
|
|
* Copy the info about any named tranches into shared memory (so that
|
|
|
|
* other processes can see it), and initialize the requested LWLocks.
|
|
|
|
*/
|
2016-02-11 20:07:33 +01:00
|
|
|
if (NamedLWLockTrancheRequests > 0)
|
|
|
|
{
|
|
|
|
char *trancheNames;
|
|
|
|
|
|
|
|
NamedLWLockTrancheArray = (NamedLWLockTranche *)
|
|
|
|
&MainLWLockArray[NUM_FIXED_LWLOCKS + numNamedLocks];
|
|
|
|
|
|
|
|
trancheNames = (char *) NamedLWLockTrancheArray +
|
|
|
|
(NamedLWLockTrancheRequests * sizeof(NamedLWLockTranche));
|
|
|
|
lock = &MainLWLockArray[NUM_FIXED_LWLOCKS];
|
|
|
|
|
|
|
|
for (i = 0; i < NamedLWLockTrancheRequests; i++)
|
2016-02-04 22:43:04 +01:00
|
|
|
{
|
2016-02-11 20:07:33 +01:00
|
|
|
NamedLWLockTrancheRequest *request;
|
|
|
|
NamedLWLockTranche *tranche;
|
|
|
|
char *name;
|
|
|
|
|
|
|
|
request = &NamedLWLockTrancheRequestArray[i];
|
|
|
|
tranche = &NamedLWLockTrancheArray[i];
|
|
|
|
|
|
|
|
name = trancheNames;
|
|
|
|
trancheNames += strlen(request->tranche_name) + 1;
|
|
|
|
strcpy(name, request->tranche_name);
|
|
|
|
tranche->trancheId = LWLockNewTrancheId();
|
2016-12-16 17:29:23 +01:00
|
|
|
tranche->trancheName = name;
|
2016-02-11 20:07:33 +01:00
|
|
|
|
|
|
|
for (j = 0; j < request->num_lwlocks; j++, lock++)
|
|
|
|
LWLockInitialize(&lock->lock, tranche->trancheId);
|
2016-02-04 22:43:04 +01:00
|
|
|
}
|
2014-01-27 17:07:44 +01:00
|
|
|
}
|
2016-02-11 20:07:33 +01:00
|
|
|
}
|
|
|
|
|
2014-06-30 09:13:48 +02:00
|
|
|
/*
|
|
|
|
* InitLWLockAccess - initialize backend-local state needed to hold LWLocks
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
InitLWLockAccess(void)
|
|
|
|
{
|
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
init_lwlock_stats();
|
|
|
|
#endif
|
|
|
|
}
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2016-02-04 22:43:04 +01:00
|
|
|
/*
|
|
|
|
* GetNamedLWLockTranche - returns the base address of LWLock from the
|
|
|
|
* specified tranche.
|
|
|
|
*
|
|
|
|
* Caller needs to retrieve the requested number of LWLocks starting from
|
|
|
|
* the base lock address returned by this API. This can be used for
|
|
|
|
* tranches that are requested by using RequestNamedLWLockTranche() API.
|
|
|
|
*/
|
|
|
|
LWLockPadded *
|
|
|
|
GetNamedLWLockTranche(const char *tranche_name)
|
|
|
|
{
|
|
|
|
int lock_pos;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Obtain the position of base address of LWLock belonging to requested
|
|
|
|
* tranche_name in MainLWLockArray. LWLocks for named tranches are placed
|
2016-02-10 15:58:09 +01:00
|
|
|
* in MainLWLockArray after fixed locks.
|
2016-02-04 22:43:04 +01:00
|
|
|
*/
|
2016-02-10 15:58:09 +01:00
|
|
|
lock_pos = NUM_FIXED_LWLOCKS;
|
2016-02-04 22:43:04 +01:00
|
|
|
for (i = 0; i < NamedLWLockTrancheRequests; i++)
|
|
|
|
{
|
|
|
|
if (strcmp(NamedLWLockTrancheRequestArray[i].tranche_name,
|
|
|
|
tranche_name) == 0)
|
|
|
|
return &MainLWLockArray[lock_pos];
|
|
|
|
|
|
|
|
lock_pos += NamedLWLockTrancheRequestArray[i].num_lwlocks;
|
|
|
|
}
|
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
elog(ERROR, "requested tranche is not registered");
|
2016-02-04 22:43:04 +01:00
|
|
|
|
|
|
|
/* just to keep compiler quiet */
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
/*
|
|
|
|
* Allocate a new tranche ID.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
LWLockNewTrancheId(void)
|
|
|
|
{
|
|
|
|
int result;
|
2014-09-22 22:42:14 +02:00
|
|
|
int *LWLockCounter;
|
2014-01-27 17:07:44 +01:00
|
|
|
|
2016-02-10 15:58:09 +01:00
|
|
|
LWLockCounter = (int *) ((char *) MainLWLockArray - sizeof(int));
|
2014-01-27 17:07:44 +01:00
|
|
|
SpinLockAcquire(ShmemLock);
|
2016-02-10 15:58:09 +01:00
|
|
|
result = (*LWLockCounter)++;
|
2014-01-27 17:07:44 +01:00
|
|
|
SpinLockRelease(ShmemLock);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-05-14 17:10:31 +02:00
|
|
|
* Register a dynamic tranche name in the lookup table of the current process.
|
|
|
|
*
|
|
|
|
* This routine will save a pointer to the tranche name passed as an argument,
|
2016-12-16 21:52:18 +01:00
|
|
|
* so the name should be allocated in a backend-lifetime context
|
2020-05-16 00:11:03 +02:00
|
|
|
* (shared memory, TopMemoryContext, static constant, or similar).
|
|
|
|
*
|
|
|
|
* The tranche name will be user-visible as a wait event name, so try to
|
|
|
|
* use a name that fits the style for those.
|
2014-01-27 17:07:44 +01:00
|
|
|
*/
|
|
|
|
void
|
2017-10-31 15:34:31 +01:00
|
|
|
LWLockRegisterTranche(int tranche_id, const char *tranche_name)
|
2014-01-27 17:07:44 +01:00
|
|
|
{
|
2020-05-14 17:10:31 +02:00
|
|
|
/* This should only be called for user-defined tranches. */
|
|
|
|
if (tranche_id < LWTRANCHE_FIRST_USER_DEFINED)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* Convert to array index. */
|
|
|
|
tranche_id -= LWTRANCHE_FIRST_USER_DEFINED;
|
2014-01-27 17:07:44 +01:00
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
/* If necessary, create or enlarge array. */
|
|
|
|
if (tranche_id >= LWLockTrancheNamesAllocated)
|
2014-01-27 17:07:44 +01:00
|
|
|
{
|
2020-05-14 17:10:31 +02:00
|
|
|
int newalloc;
|
2014-01-27 17:07:44 +01:00
|
|
|
|
2021-07-01 05:29:06 +02:00
|
|
|
newalloc = pg_nextpower2_32(Max(8, tranche_id + 1));
|
2014-01-27 17:07:44 +01:00
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
if (LWLockTrancheNames == NULL)
|
|
|
|
LWLockTrancheNames = (const char **)
|
|
|
|
MemoryContextAllocZero(TopMemoryContext,
|
|
|
|
newalloc * sizeof(char *));
|
|
|
|
else
|
2022-11-12 20:31:27 +01:00
|
|
|
LWLockTrancheNames =
|
|
|
|
repalloc0_array(LWLockTrancheNames, const char *, LWLockTrancheNamesAllocated, newalloc);
|
2020-05-14 17:10:31 +02:00
|
|
|
LWLockTrancheNamesAllocated = newalloc;
|
2014-01-27 17:07:44 +01:00
|
|
|
}
|
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
LWLockTrancheNames[tranche_id] = tranche_name;
|
2014-01-27 17:07:44 +01:00
|
|
|
}
|
|
|
|
|
2016-02-04 22:43:04 +01:00
|
|
|
/*
|
|
|
|
* RequestNamedLWLockTranche
|
|
|
|
* Request that extra LWLocks be allocated during postmaster
|
|
|
|
* startup.
|
|
|
|
*
|
2022-05-13 15:31:06 +02:00
|
|
|
* This may only be called via the shmem_request_hook of a library that is
|
|
|
|
* loaded into the postmaster via shared_preload_libraries. Calls from
|
|
|
|
* elsewhere will fail.
|
2020-05-16 00:11:03 +02:00
|
|
|
*
|
|
|
|
* The tranche name will be user-visible as a wait event name, so try to
|
|
|
|
* use a name that fits the style for those.
|
2016-02-04 22:43:04 +01:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
|
|
|
|
{
|
|
|
|
NamedLWLockTrancheRequest *request;
|
|
|
|
|
2022-05-13 15:31:06 +02:00
|
|
|
if (!process_shmem_requests_in_progress)
|
|
|
|
elog(FATAL, "cannot request additional LWLocks outside shmem_request_hook");
|
2016-02-04 22:43:04 +01:00
|
|
|
|
|
|
|
if (NamedLWLockTrancheRequestArray == NULL)
|
|
|
|
{
|
|
|
|
NamedLWLockTrancheRequestsAllocated = 16;
|
|
|
|
NamedLWLockTrancheRequestArray = (NamedLWLockTrancheRequest *)
|
|
|
|
MemoryContextAlloc(TopMemoryContext,
|
|
|
|
NamedLWLockTrancheRequestsAllocated
|
|
|
|
* sizeof(NamedLWLockTrancheRequest));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (NamedLWLockTrancheRequests >= NamedLWLockTrancheRequestsAllocated)
|
|
|
|
{
|
2021-07-01 05:29:06 +02:00
|
|
|
int i = pg_nextpower2_32(NamedLWLockTrancheRequests + 1);
|
2016-02-04 22:43:04 +01:00
|
|
|
|
|
|
|
NamedLWLockTrancheRequestArray = (NamedLWLockTrancheRequest *)
|
|
|
|
repalloc(NamedLWLockTrancheRequestArray,
|
|
|
|
i * sizeof(NamedLWLockTrancheRequest));
|
|
|
|
NamedLWLockTrancheRequestsAllocated = i;
|
|
|
|
}
|
|
|
|
|
|
|
|
request = &NamedLWLockTrancheRequestArray[NamedLWLockTrancheRequests];
|
2020-05-14 17:10:31 +02:00
|
|
|
Assert(strlen(tranche_name) + 1 <= NAMEDATALEN);
|
|
|
|
strlcpy(request->tranche_name, tranche_name, NAMEDATALEN);
|
2016-02-04 22:43:04 +01:00
|
|
|
request->num_lwlocks = num_lwlocks;
|
|
|
|
NamedLWLockTrancheRequests++;
|
|
|
|
}
|
|
|
|
|
2014-01-27 17:07:44 +01:00
|
|
|
/*
|
|
|
|
* LWLockInitialize - initialize a new lwlock; it's initially unlocked
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
LWLockInitialize(LWLock *lock, int tranche_id)
|
|
|
|
{
|
2014-12-25 17:24:30 +01:00
|
|
|
pg_atomic_init_u32(&lock->state, LW_FLAG_RELEASE_OK);
|
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
pg_atomic_init_u32(&lock->nwaiters, 0);
|
|
|
|
#endif
|
2014-01-27 17:07:44 +01:00
|
|
|
lock->tranche = tranche_id;
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_init(&lock->waiters);
|
2014-01-27 17:07:44 +01:00
|
|
|
}
|
|
|
|
|
2016-03-10 18:44:09 +01:00
|
|
|
/*
|
|
|
|
* Report start of wait event for light-weight locks.
|
|
|
|
*
|
|
|
|
* This function will be used by all the light-weight lock calls which
|
|
|
|
* needs to wait to acquire the lock. This function distinguishes wait
|
|
|
|
* event based on tranche and lock id.
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
LWLockReportWaitStart(LWLock *lock)
|
|
|
|
{
|
2016-12-16 17:29:23 +01:00
|
|
|
pgstat_report_wait_start(PG_WAIT_LWLOCK | lock->tranche);
|
2016-03-10 18:44:09 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Report end of wait event for light-weight locks.
|
|
|
|
*/
|
|
|
|
static inline void
|
2016-03-27 22:53:31 +02:00
|
|
|
LWLockReportWaitEnd(void)
|
2016-03-10 18:44:09 +01:00
|
|
|
{
|
|
|
|
pgstat_report_wait_end();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-05-14 17:10:31 +02:00
|
|
|
* Return the name of an LWLock tranche.
|
2016-03-10 18:44:09 +01:00
|
|
|
*/
|
2020-05-14 17:10:31 +02:00
|
|
|
static const char *
|
|
|
|
GetLWTrancheName(uint16 trancheId)
|
2016-03-10 18:44:09 +01:00
|
|
|
{
|
2020-05-14 17:10:31 +02:00
|
|
|
/* Individual LWLock? */
|
|
|
|
if (trancheId < NUM_INDIVIDUAL_LWLOCKS)
|
2020-05-16 01:55:56 +02:00
|
|
|
return IndividualLWLockNames[trancheId];
|
2020-05-14 17:10:31 +02:00
|
|
|
|
|
|
|
/* Built-in tranche? */
|
|
|
|
if (trancheId < LWTRANCHE_FIRST_USER_DEFINED)
|
|
|
|
return BuiltinTrancheNames[trancheId - NUM_INDIVIDUAL_LWLOCKS];
|
2016-03-10 18:44:09 +01:00
|
|
|
|
|
|
|
/*
|
2020-05-14 17:10:31 +02:00
|
|
|
* It's an extension tranche, so look in LWLockTrancheNames[]. However,
|
|
|
|
* it's possible that the tranche has never been registered in the current
|
|
|
|
* process, in which case give up and return "extension".
|
2016-03-10 18:44:09 +01:00
|
|
|
*/
|
2020-05-14 17:10:31 +02:00
|
|
|
trancheId -= LWTRANCHE_FIRST_USER_DEFINED;
|
|
|
|
|
|
|
|
if (trancheId >= LWLockTrancheNamesAllocated ||
|
|
|
|
LWLockTrancheNames[trancheId] == NULL)
|
2016-03-10 18:44:09 +01:00
|
|
|
return "extension";
|
|
|
|
|
2020-05-14 17:10:31 +02:00
|
|
|
return LWLockTrancheNames[trancheId];
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return an identifier for an LWLock based on the wait class and event.
|
|
|
|
*/
|
|
|
|
const char *
|
|
|
|
GetLWLockIdentifier(uint32 classId, uint16 eventId)
|
|
|
|
{
|
|
|
|
Assert(classId == PG_WAIT_LWLOCK);
|
|
|
|
/* The event IDs are just tranche numbers. */
|
|
|
|
return GetLWTrancheName(eventId);
|
2016-03-10 18:44:09 +01:00
|
|
|
}
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* Internal function that tries to atomically acquire the lwlock in the passed
|
|
|
|
* in mode.
|
|
|
|
*
|
|
|
|
* This function will not block waiting for a lock to become free - that's the
|
|
|
|
* callers job.
|
|
|
|
*
|
|
|
|
* Returns true if the lock isn't free and we need to wait.
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
LWLockAttemptLock(LWLock *lock, LWLockMode mode)
|
|
|
|
{
|
2015-07-31 20:50:35 +02:00
|
|
|
uint32 old_state;
|
|
|
|
|
2022-10-28 09:19:06 +02:00
|
|
|
Assert(mode == LW_EXCLUSIVE || mode == LW_SHARED);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2015-07-31 20:50:35 +02:00
|
|
|
/*
|
|
|
|
* Read once outside the loop, later iterations will get the newer value
|
|
|
|
* via compare & exchange.
|
|
|
|
*/
|
|
|
|
old_state = pg_atomic_read_u32(&lock->state);
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/* loop until we've determined whether we could acquire the lock or not */
|
|
|
|
while (true)
|
|
|
|
{
|
|
|
|
uint32 desired_state;
|
|
|
|
bool lock_free;
|
|
|
|
|
2015-07-31 20:50:35 +02:00
|
|
|
desired_state = old_state;
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
if (mode == LW_EXCLUSIVE)
|
|
|
|
{
|
2015-07-31 20:50:35 +02:00
|
|
|
lock_free = (old_state & LW_LOCK_MASK) == 0;
|
2014-12-25 17:24:30 +01:00
|
|
|
if (lock_free)
|
|
|
|
desired_state += LW_VAL_EXCLUSIVE;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2015-07-31 20:50:35 +02:00
|
|
|
lock_free = (old_state & LW_VAL_EXCLUSIVE) == 0;
|
2014-12-25 17:24:30 +01:00
|
|
|
if (lock_free)
|
|
|
|
desired_state += LW_VAL_SHARED;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempt to swap in the state we are expecting. If we didn't see
|
|
|
|
* lock to be free, that's just the old value. If we saw it as free,
|
|
|
|
* we'll attempt to mark it acquired. The reason that we always swap
|
|
|
|
* in the value is that this doubles as a memory barrier. We could try
|
|
|
|
* to be smarter and only swap in values if we saw the lock as free,
|
|
|
|
* but benchmark haven't shown it as beneficial so far.
|
|
|
|
*
|
|
|
|
* Retry if the value changed since we last looked at it.
|
|
|
|
*/
|
|
|
|
if (pg_atomic_compare_exchange_u32(&lock->state,
|
2015-07-31 20:50:35 +02:00
|
|
|
&old_state, desired_state))
|
2014-12-25 17:24:30 +01:00
|
|
|
{
|
|
|
|
if (lock_free)
|
|
|
|
{
|
|
|
|
/* Great! Got the lock. */
|
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
if (mode == LW_EXCLUSIVE)
|
|
|
|
lock->owner = MyProc;
|
|
|
|
#endif
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
else
|
2017-02-06 10:33:58 +01:00
|
|
|
return true; /* somebody else has the lock */
|
2014-12-25 17:24:30 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
pg_unreachable();
|
|
|
|
}
|
|
|
|
|
2016-04-11 05:12:32 +02:00
|
|
|
/*
|
|
|
|
* Lock the LWLock's wait list against concurrent activity.
|
|
|
|
*
|
|
|
|
* NB: even though the wait list is locked, non-conflicting lock operations
|
|
|
|
* may still happen concurrently.
|
|
|
|
*
|
|
|
|
* Time spent holding mutex should be short!
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
LWLockWaitListLock(LWLock *lock)
|
|
|
|
{
|
|
|
|
uint32 old_state;
|
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
lwlock_stats *lwstats;
|
|
|
|
uint32 delays = 0;
|
|
|
|
|
|
|
|
lwstats = get_lwlock_stats_entry(lock);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
while (true)
|
|
|
|
{
|
|
|
|
/* always try once to acquire lock directly */
|
|
|
|
old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
|
|
|
|
if (!(old_state & LW_FLAG_LOCKED))
|
|
|
|
break; /* got lock */
|
|
|
|
|
|
|
|
/* and then spin without atomic operations until lock is released */
|
|
|
|
{
|
2016-04-15 04:26:13 +02:00
|
|
|
SpinDelayStatus delayStatus;
|
|
|
|
|
|
|
|
init_local_spin_delay(&delayStatus);
|
2016-04-11 05:12:32 +02:00
|
|
|
|
|
|
|
while (old_state & LW_FLAG_LOCKED)
|
|
|
|
{
|
|
|
|
perform_spin_delay(&delayStatus);
|
|
|
|
old_state = pg_atomic_read_u32(&lock->state);
|
|
|
|
}
|
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
delays += delayStatus.delays;
|
|
|
|
#endif
|
|
|
|
finish_spin_delay(&delayStatus);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Retry. The lock might obviously already be re-acquired by the time
|
|
|
|
* we're attempting to get it again.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
lwstats->spin_delay_count += delays;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unlock the LWLock's wait list.
|
|
|
|
*
|
|
|
|
* Note that it can be more efficient to manipulate flags and release the
|
|
|
|
* locks in a single atomic operation.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
LWLockWaitListUnlock(LWLock *lock)
|
|
|
|
{
|
|
|
|
uint32 old_state PG_USED_FOR_ASSERTS_ONLY;
|
|
|
|
|
|
|
|
old_state = pg_atomic_fetch_and_u32(&lock->state, ~LW_FLAG_LOCKED);
|
|
|
|
|
|
|
|
Assert(old_state & LW_FLAG_LOCKED);
|
|
|
|
}
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* Wakeup all the lockers that currently have a chance to acquire the lock.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
LWLockWakeup(LWLock *lock)
|
|
|
|
{
|
|
|
|
bool new_release_ok;
|
|
|
|
bool wokeup_somebody = false;
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_head wakeup;
|
|
|
|
proclist_mutable_iter iter;
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_init(&wakeup);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
new_release_ok = true;
|
|
|
|
|
2016-04-11 05:12:32 +02:00
|
|
|
/* lock wait list while collecting backends to wake up */
|
|
|
|
LWLockWaitListLock(lock);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_foreach_modify(iter, &lock->waiters, lwWaitLink)
|
2014-12-25 17:24:30 +01:00
|
|
|
{
|
2016-08-16 00:09:55 +02:00
|
|
|
PGPROC *waiter = GetPGProcByNumber(iter.cur);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
|
|
|
|
continue;
|
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_delete(&lock->waiters, iter.cur, lwWaitLink);
|
|
|
|
proclist_push_tail(&wakeup, iter.cur, lwWaitLink);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Prevent additional wakeups until retryer gets to run. Backends
|
|
|
|
* that are just waiting for the lock to become free don't retry
|
|
|
|
* automatically.
|
|
|
|
*/
|
|
|
|
new_release_ok = false;
|
2015-05-24 03:35:49 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* Don't wakeup (further) exclusive locks.
|
|
|
|
*/
|
|
|
|
wokeup_somebody = true;
|
|
|
|
}
|
|
|
|
|
2022-11-20 20:56:32 +01:00
|
|
|
/*
|
|
|
|
* Signal that the process isn't on the wait list anymore. This allows
|
|
|
|
* LWLockDequeueSelf() to remove itself of the waitlist with a
|
|
|
|
* proclist_delete(), rather than having to check if it has been
|
|
|
|
* removed from the list.
|
|
|
|
*/
|
|
|
|
Assert(waiter->lwWaiting == LW_WS_WAITING);
|
|
|
|
waiter->lwWaiting = LW_WS_PENDING_WAKEUP;
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* Once we've woken up an exclusive lock, there's no point in waking
|
|
|
|
* up anybody else.
|
|
|
|
*/
|
|
|
|
if (waiter->lwWaitMode == LW_EXCLUSIVE)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
Assert(proclist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2016-04-11 05:12:32 +02:00
|
|
|
/* unset required flags, and release lock, in one fell swoop */
|
|
|
|
{
|
|
|
|
uint32 old_state;
|
|
|
|
uint32 desired_state;
|
|
|
|
|
|
|
|
old_state = pg_atomic_read_u32(&lock->state);
|
|
|
|
while (true)
|
|
|
|
{
|
|
|
|
desired_state = old_state;
|
|
|
|
|
|
|
|
/* compute desired flags */
|
|
|
|
|
|
|
|
if (new_release_ok)
|
|
|
|
desired_state |= LW_FLAG_RELEASE_OK;
|
|
|
|
else
|
|
|
|
desired_state &= ~LW_FLAG_RELEASE_OK;
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
if (proclist_is_empty(&wakeup))
|
2016-04-11 05:12:32 +02:00
|
|
|
desired_state &= ~LW_FLAG_HAS_WAITERS;
|
|
|
|
|
|
|
|
desired_state &= ~LW_FLAG_LOCKED; /* release lock */
|
|
|
|
|
|
|
|
if (pg_atomic_compare_exchange_u32(&lock->state, &old_state,
|
|
|
|
desired_state))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
/* Awaken any waiters I removed from the queue. */
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_foreach_modify(iter, &wakeup, lwWaitLink)
|
2014-12-25 17:24:30 +01:00
|
|
|
{
|
2016-08-16 00:09:55 +02:00
|
|
|
PGPROC *waiter = GetPGProcByNumber(iter.cur);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
LOG_LWDEBUG("LWLockRelease", lock, "release waiter");
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_delete(&wakeup, iter.cur, lwWaitLink);
|
2015-05-24 03:35:49 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* Guarantee that lwWaiting being unset only becomes visible once the
|
|
|
|
* unlink from the link has completed. Otherwise the target backend
|
|
|
|
* could be woken up for other reason and enqueue for a new lock - if
|
|
|
|
* that happens before the list unlink happens, the list would end up
|
|
|
|
* being corrupted.
|
|
|
|
*
|
2017-02-06 10:33:58 +01:00
|
|
|
* The barrier pairs with the LWLockWaitListLock() when enqueuing for
|
2014-12-25 17:24:30 +01:00
|
|
|
* another lock.
|
|
|
|
*/
|
|
|
|
pg_write_barrier();
|
2022-11-20 20:56:32 +01:00
|
|
|
waiter->lwWaiting = LW_WS_NOT_WAITING;
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreUnlock(waiter->sem);
|
2014-12-25 17:24:30 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add ourselves to the end of the queue.
|
|
|
|
*
|
|
|
|
* NB: Mode can be LW_WAIT_UNTIL_FREE here!
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
LWLockQueueSelf(LWLock *lock, LWLockMode mode)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If we don't have a PGPROC structure, there's no way to wait. This
|
|
|
|
* should never occur, since MyProc should only be null during shared
|
|
|
|
* memory initialization.
|
|
|
|
*/
|
|
|
|
if (MyProc == NULL)
|
|
|
|
elog(PANIC, "cannot wait without a PGPROC structure");
|
|
|
|
|
2022-11-20 20:56:32 +01:00
|
|
|
if (MyProc->lwWaiting != LW_WS_NOT_WAITING)
|
2014-12-25 17:24:30 +01:00
|
|
|
elog(PANIC, "queueing for lock while waiting on another one");
|
|
|
|
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListLock(lock);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
/* setting the flag is protected by the spinlock */
|
|
|
|
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_HAS_WAITERS);
|
|
|
|
|
2022-11-20 20:56:32 +01:00
|
|
|
MyProc->lwWaiting = LW_WS_WAITING;
|
2014-12-25 17:24:30 +01:00
|
|
|
MyProc->lwWaitMode = mode;
|
|
|
|
|
|
|
|
/* LW_WAIT_UNTIL_FREE waiters are always at the front of the queue */
|
|
|
|
if (mode == LW_WAIT_UNTIL_FREE)
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_push_head(&lock->waiters, MyProc->pgprocno, lwWaitLink);
|
2014-12-25 17:24:30 +01:00
|
|
|
else
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_push_tail(&lock->waiters, MyProc->pgprocno, lwWaitLink);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
/* Can release the mutex now */
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListUnlock(lock);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
pg_atomic_fetch_add_u32(&lock->nwaiters, 1);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove ourselves from the waitlist.
|
|
|
|
*
|
|
|
|
* This is used if we queued ourselves because we thought we needed to sleep
|
|
|
|
* but, after further checking, we discovered that we don't actually need to
|
2015-07-29 10:13:10 +02:00
|
|
|
* do so.
|
2014-12-25 17:24:30 +01:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
LWLockDequeueSelf(LWLock *lock)
|
|
|
|
{
|
2022-11-20 20:56:32 +01:00
|
|
|
bool on_waitlist;
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
lwlock_stats *lwstats;
|
|
|
|
|
|
|
|
lwstats = get_lwlock_stats_entry(lock);
|
|
|
|
|
|
|
|
lwstats->dequeue_self_count++;
|
|
|
|
#endif
|
|
|
|
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListLock(lock);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
/*
|
2023-05-19 23:24:48 +02:00
|
|
|
* Remove ourselves from the waitlist, unless we've already been removed.
|
|
|
|
* The removal happens with the wait list lock held, so there's no race in
|
|
|
|
* this check.
|
2014-12-25 17:24:30 +01:00
|
|
|
*/
|
2022-11-20 20:56:32 +01:00
|
|
|
on_waitlist = MyProc->lwWaiting == LW_WS_WAITING;
|
|
|
|
if (on_waitlist)
|
|
|
|
proclist_delete(&lock->waiters, MyProc->pgprocno, lwWaitLink);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
if (proclist_is_empty(&lock->waiters) &&
|
2014-12-25 17:24:30 +01:00
|
|
|
(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) != 0)
|
|
|
|
{
|
|
|
|
pg_atomic_fetch_and_u32(&lock->state, ~LW_FLAG_HAS_WAITERS);
|
|
|
|
}
|
|
|
|
|
2016-04-11 05:12:32 +02:00
|
|
|
/* XXX: combine with fetch_and above? */
|
|
|
|
LWLockWaitListUnlock(lock);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
/* clear waiting state again, nice for debugging */
|
2022-11-20 20:56:32 +01:00
|
|
|
if (on_waitlist)
|
|
|
|
MyProc->lwWaiting = LW_WS_NOT_WAITING;
|
2014-12-25 17:24:30 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
int extraWaits = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Somebody else dequeued us and has or will wake us up. Deal with the
|
2015-04-26 18:42:31 +02:00
|
|
|
* superfluous absorption of a wakeup.
|
2014-12-25 17:24:30 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
2019-08-13 06:53:41 +02:00
|
|
|
* Reset RELEASE_OK flag if somebody woke us before we removed
|
|
|
|
* ourselves - they'll have set it to false.
|
2014-12-25 17:24:30 +01:00
|
|
|
*/
|
|
|
|
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now wait for the scheduled wakeup, otherwise our ->lwWaiting would
|
|
|
|
* get reset at some inconvenient point later. Most of the time this
|
|
|
|
* will immediately return.
|
|
|
|
*/
|
|
|
|
for (;;)
|
|
|
|
{
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreLock(MyProc->sem);
|
2022-11-20 20:56:32 +01:00
|
|
|
if (MyProc->lwWaiting == LW_WS_NOT_WAITING)
|
2014-12-25 17:24:30 +01:00
|
|
|
break;
|
|
|
|
extraWaits++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fix the process wait semaphore's count for any absorbed wakeups.
|
|
|
|
*/
|
|
|
|
while (extraWaits-- > 0)
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreUnlock(MyProc->sem);
|
2014-12-25 17:24:30 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
{
|
|
|
|
/* not waiting anymore */
|
2015-03-26 17:00:30 +01:00
|
|
|
uint32 nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);
|
2015-05-24 03:35:49 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
Assert(nwaiters < MAX_BACKENDS);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* LWLockAcquire - acquire a lightweight lock in the specified mode
|
|
|
|
*
|
2014-03-21 15:06:08 +01:00
|
|
|
* If the lock is not available, sleep until it is. Returns true if the lock
|
|
|
|
* was available immediately, false if we had to sleep.
|
2001-09-29 06:02:27 +02:00
|
|
|
*
|
|
|
|
* Side effect: cancel/die interrupts are held off until lock release.
|
|
|
|
*/
|
2014-03-21 15:06:08 +01:00
|
|
|
bool
|
2015-07-31 20:20:43 +02:00
|
|
|
LWLockAcquire(LWLock *lock, LWLockMode mode)
|
2001-09-29 06:02:27 +02:00
|
|
|
{
|
2002-06-11 15:40:53 +02:00
|
|
|
PGPROC *proc = MyProc;
|
2014-03-21 15:06:08 +01:00
|
|
|
bool result = true;
|
2002-01-07 17:33:00 +01:00
|
|
|
int extraWaits = 0;
|
2014-01-27 17:07:44 +01:00
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
lwlock_stats *lwstats;
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
lwstats = get_lwlock_stats_entry(lock);
|
2014-01-27 17:07:44 +01:00
|
|
|
#endif
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2022-10-28 09:19:06 +02:00
|
|
|
Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
PRINT_LWDEBUG("LWLockAcquire", lock, mode);
|
2014-01-27 17:07:44 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
#ifdef LWLOCK_STATS
|
2006-04-21 18:45:12 +02:00
|
|
|
/* Count lock acquisition attempts */
|
|
|
|
if (mode == LW_EXCLUSIVE)
|
2014-01-27 17:07:44 +01:00
|
|
|
lwstats->ex_acquire_count++;
|
2006-04-21 18:45:12 +02:00
|
|
|
else
|
2014-01-27 17:07:44 +01:00
|
|
|
lwstats->sh_acquire_count++;
|
2006-04-21 18:45:12 +02:00
|
|
|
#endif /* LWLOCK_STATS */
|
|
|
|
|
2002-09-25 22:31:40 +02:00
|
|
|
/*
|
|
|
|
* We can't wait if we haven't got a PGPROC. This should only occur
|
|
|
|
* during bootstrap or shared memory initialization. Put an Assert here
|
|
|
|
* to catch unsafe coding practices.
|
|
|
|
*/
|
|
|
|
Assert(!(proc == NULL && IsUnderPostmaster));
|
|
|
|
|
2005-04-08 16:18:35 +02:00
|
|
|
/* Ensure we will have room to remember the lock */
|
|
|
|
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
|
|
|
|
elog(ERROR, "too many LWLocks taken");
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
/*
|
|
|
|
* Lock out cancel/die interrupts until we exit the code section protected
|
|
|
|
* by the LWLock. This ensures that interrupts will not interfere with
|
|
|
|
* manipulations of data structures in shared memory.
|
|
|
|
*/
|
|
|
|
HOLD_INTERRUPTS();
|
|
|
|
|
2002-01-07 17:33:00 +01:00
|
|
|
/*
|
|
|
|
* Loop here to try to acquire lock after each time we are signaled by
|
|
|
|
* LWLockRelease.
|
|
|
|
*
|
|
|
|
* NOTE: it might seem better to have LWLockRelease actually grant us the
|
|
|
|
* lock, rather than retrying and possibly having to go back to sleep. But
|
|
|
|
* in practice that is no good because it means a process swap for every
|
|
|
|
* lock acquisition when two or more processes are contending for the same
|
|
|
|
* lock. Since LWLocks are normally used to protect not-very-long
|
|
|
|
* sections of computation, a process needs to be able to acquire and
|
|
|
|
* release the same lock many times during a single CPU time slice, even
|
|
|
|
* in the presence of contention. The efficiency of being able to do that
|
|
|
|
* outweighs the inefficiency of sometimes wasting a process dispatch
|
|
|
|
* cycle because the lock is not free when a released waiter finally gets
|
|
|
|
* to run. See pgsql-hackers archives for 29-Dec-01.
|
|
|
|
*/
|
|
|
|
for (;;)
|
2001-12-29 22:30:32 +01:00
|
|
|
{
|
2002-01-07 17:33:00 +01:00
|
|
|
bool mustwait;
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* Try to grab the lock the first time, we're not in the waitqueue
|
|
|
|
* yet/anymore.
|
|
|
|
*/
|
|
|
|
mustwait = LWLockAttemptLock(lock, mode);
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2002-01-07 17:33:00 +01:00
|
|
|
if (!mustwait)
|
2014-12-25 17:24:30 +01:00
|
|
|
{
|
|
|
|
LOG_LWDEBUG("LWLockAcquire", lock, "immediately acquired lock");
|
2002-01-07 17:33:00 +01:00
|
|
|
break; /* got the lock */
|
2014-12-25 17:24:30 +01:00
|
|
|
}
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
/*
|
2014-12-25 17:24:30 +01:00
|
|
|
* Ok, at this point we couldn't grab the lock on the first try. We
|
|
|
|
* cannot simply queue ourselves to the end of the list and wait to be
|
|
|
|
* woken up because by now the lock could long have been released.
|
|
|
|
* Instead add us to the queue and try to grab the lock again. If we
|
|
|
|
* succeed we need to revert the queuing and be happy, otherwise we
|
|
|
|
* recheck the lock. If we still couldn't grab it, we know that the
|
2017-01-17 20:49:20 +01:00
|
|
|
* other locker will see our queue entries when releasing since they
|
2014-12-25 17:24:30 +01:00
|
|
|
* existed before we checked for the lock.
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/* add to the queue */
|
|
|
|
LWLockQueueSelf(lock, mode);
|
|
|
|
|
|
|
|
/* we're now guaranteed to be woken up if necessary */
|
|
|
|
mustwait = LWLockAttemptLock(lock, mode);
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/* ok, grabbed the lock the second time round, need to undo queueing */
|
|
|
|
if (!mustwait)
|
|
|
|
{
|
|
|
|
LOG_LWDEBUG("LWLockAcquire", lock, "acquired, undoing queue");
|
|
|
|
|
|
|
|
LWLockDequeueSelf(lock);
|
|
|
|
break;
|
|
|
|
}
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Wait until awakened.
|
|
|
|
*
|
2021-07-09 07:51:48 +02:00
|
|
|
* It is possible that we get awakened for a reason other than being
|
|
|
|
* signaled by LWLockRelease. If so, loop back and wait again. Once
|
|
|
|
* we've gotten the LWLock, re-increment the sema by the number of
|
|
|
|
* additional signals received.
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG("LWLockAcquire", lock, "waiting");
|
2001-12-29 00:26:04 +01:00
|
|
|
|
2006-04-21 18:45:12 +02:00
|
|
|
#ifdef LWLOCK_STATS
|
2014-01-27 17:07:44 +01:00
|
|
|
lwstats->block_count++;
|
2006-04-21 18:45:12 +02:00
|
|
|
#endif
|
|
|
|
|
2016-03-10 18:44:09 +01:00
|
|
|
LWLockReportWaitStart(lock);
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
|
2006-07-24 18:32:45 +02:00
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
for (;;)
|
|
|
|
{
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreLock(proc->sem);
|
2022-11-20 20:56:32 +01:00
|
|
|
if (proc->lwWaiting == LW_WS_NOT_WAITING)
|
2001-09-29 06:02:27 +02:00
|
|
|
break;
|
|
|
|
extraWaits++;
|
|
|
|
}
|
2001-10-25 07:50:21 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/* Retrying, allow LWLockRelease to release waiters again. */
|
|
|
|
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
|
|
|
|
|
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
{
|
|
|
|
/* not waiting anymore */
|
2015-03-26 17:00:30 +01:00
|
|
|
uint32 nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);
|
2015-05-24 03:35:49 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
Assert(nwaiters < MAX_BACKENDS);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
|
2016-03-10 18:44:09 +01:00
|
|
|
LWLockReportWaitEnd();
|
2006-07-24 18:32:45 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
|
2001-12-29 00:26:04 +01:00
|
|
|
|
2002-01-07 17:33:00 +01:00
|
|
|
/* Now loop back and try to acquire lock again. */
|
2014-03-21 15:06:08 +01:00
|
|
|
result = false;
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
2001-12-29 22:28:18 +01:00
|
|
|
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_ACQUIRE_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
|
2006-07-24 18:32:45 +02:00
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
/* Add lock to list of locks held by this backend */
|
2014-12-25 17:24:30 +01:00
|
|
|
held_lwlocks[num_held_lwlocks].lock = lock;
|
|
|
|
held_lwlocks[num_held_lwlocks++].mode = mode;
|
2002-01-07 17:33:00 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Fix the process wait semaphore's count for any absorbed wakeups.
|
|
|
|
*/
|
|
|
|
while (extraWaits-- > 0)
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreUnlock(proc->sem);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
return result;
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* LWLockConditionalAcquire - acquire a lightweight lock in the specified mode
|
|
|
|
*
|
2017-08-16 06:22:32 +02:00
|
|
|
* If the lock is not available, return false with no side-effects.
|
2001-09-29 06:02:27 +02:00
|
|
|
*
|
|
|
|
* If successful, cancel/die interrupts are held off until lock release.
|
|
|
|
*/
|
|
|
|
bool
|
2014-09-22 22:42:14 +02:00
|
|
|
LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
|
2001-09-29 06:02:27 +02:00
|
|
|
{
|
|
|
|
bool mustwait;
|
|
|
|
|
2022-10-28 09:19:06 +02:00
|
|
|
Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
PRINT_LWDEBUG("LWLockConditionalAcquire", lock, mode);
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2005-04-08 16:18:35 +02:00
|
|
|
/* Ensure we will have room to remember the lock */
|
|
|
|
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
|
|
|
|
elog(ERROR, "too many LWLocks taken");
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
/*
|
|
|
|
* Lock out cancel/die interrupts until we exit the code section protected
|
|
|
|
* by the LWLock. This ensures that interrupts will not interfere with
|
|
|
|
* manipulations of data structures in shared memory.
|
|
|
|
*/
|
|
|
|
HOLD_INTERRUPTS();
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/* Check for the lock */
|
|
|
|
mustwait = LWLockAttemptLock(lock, mode);
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
if (mustwait)
|
|
|
|
{
|
|
|
|
/* Failed to get lock, so release interrupt holdoff */
|
|
|
|
RESUME_INTERRUPTS();
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Add lock to list of locks held by this backend */
|
2014-12-25 17:24:30 +01:00
|
|
|
held_lwlocks[num_held_lwlocks].lock = lock;
|
|
|
|
held_lwlocks[num_held_lwlocks++].mode = mode;
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
|
|
|
return !mustwait;
|
|
|
|
}
|
|
|
|
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
/*
|
2012-02-08 08:17:13 +01:00
|
|
|
* LWLockAcquireOrWait - Acquire lock, or wait until it's free
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
*
|
|
|
|
* The semantics of this function are a bit funky. If the lock is currently
|
|
|
|
* free, it is acquired in the given mode, and the function returns true. If
|
|
|
|
* the lock isn't immediately free, the function waits until it is released
|
|
|
|
* and returns false, but does not acquire the lock.
|
|
|
|
*
|
|
|
|
* This is currently used for WALWriteLock: when a backend flushes the WAL,
|
|
|
|
* holding WALWriteLock, it can flush the commit records of many other
|
|
|
|
* backends as a side-effect. Those other backends need to wait until the
|
|
|
|
* flush finishes, but don't need to acquire the lock anymore. They can just
|
|
|
|
* wake up, observe that their records have already been flushed, and return.
|
|
|
|
*/
|
|
|
|
bool
|
2014-09-22 22:42:14 +02:00
|
|
|
LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
{
|
|
|
|
PGPROC *proc = MyProc;
|
|
|
|
bool mustwait;
|
|
|
|
int extraWaits = 0;
|
2014-01-27 17:07:44 +01:00
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
lwlock_stats *lwstats;
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-09-22 22:42:14 +02:00
|
|
|
lwstats = get_lwlock_stats_entry(lock);
|
2012-02-07 08:38:25 +01:00
|
|
|
#endif
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
|
|
|
|
|
|
|
|
PRINT_LWDEBUG("LWLockAcquireOrWait", lock, mode);
|
|
|
|
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
/* Ensure we will have room to remember the lock */
|
|
|
|
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
|
|
|
|
elog(ERROR, "too many LWLocks taken");
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lock out cancel/die interrupts until we exit the code section protected
|
|
|
|
* by the LWLock. This ensures that interrupts will not interfere with
|
|
|
|
* manipulations of data structures in shared memory.
|
|
|
|
*/
|
|
|
|
HOLD_INTERRUPTS();
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* NB: We're using nearly the same twice-in-a-row lock acquisition
|
|
|
|
* protocol as LWLockAcquire(). Check its comments for details.
|
|
|
|
*/
|
|
|
|
mustwait = LWLockAttemptLock(lock, mode);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
|
|
|
if (mustwait)
|
|
|
|
{
|
2014-12-25 17:24:30 +01:00
|
|
|
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
mustwait = LWLockAttemptLock(lock, mode);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
if (mustwait)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Wait until awakened. Like in LWLockAcquire, be prepared for
|
2021-07-09 07:51:48 +02:00
|
|
|
* bogus wakeups.
|
2014-12-25 17:24:30 +01:00
|
|
|
*/
|
|
|
|
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "waiting");
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
|
|
|
#ifdef LWLOCK_STATS
|
2014-12-25 17:24:30 +01:00
|
|
|
lwstats->block_count++;
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
#endif
|
2016-03-10 18:44:09 +01:00
|
|
|
|
|
|
|
LWLockReportWaitStart(lock);
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
for (;;)
|
|
|
|
{
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreLock(proc->sem);
|
2022-11-20 20:56:32 +01:00
|
|
|
if (proc->lwWaiting == LW_WS_NOT_WAITING)
|
2014-12-25 17:24:30 +01:00
|
|
|
break;
|
|
|
|
extraWaits++;
|
|
|
|
}
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
{
|
|
|
|
/* not waiting anymore */
|
2015-03-26 17:00:30 +01:00
|
|
|
uint32 nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);
|
2015-05-24 03:35:49 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
Assert(nwaiters < MAX_BACKENDS);
|
|
|
|
}
|
|
|
|
#endif
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
|
2016-03-10 18:44:09 +01:00
|
|
|
LWLockReportWaitEnd();
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "acquired, undoing queue");
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* Got lock in the second attempt, undo queueing. We need to treat
|
|
|
|
* this as having successfully acquired the lock, otherwise we'd
|
|
|
|
* not necessarily wake up people we've prevented from acquiring
|
|
|
|
* the lock.
|
|
|
|
*/
|
|
|
|
LWLockDequeueSelf(lock);
|
|
|
|
}
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fix the process wait semaphore's count for any absorbed wakeups.
|
|
|
|
*/
|
|
|
|
while (extraWaits-- > 0)
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreUnlock(proc->sem);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
|
|
|
if (mustwait)
|
|
|
|
{
|
|
|
|
/* Failed to get lock, so release interrupt holdoff */
|
|
|
|
RESUME_INTERRUPTS();
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "succeeded");
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
/* Add lock to list of locks held by this backend */
|
2014-12-25 17:24:30 +01:00
|
|
|
held_lwlocks[num_held_lwlocks].lock = lock;
|
|
|
|
held_lwlocks[num_held_lwlocks++].mode = mode;
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return !mustwait;
|
|
|
|
}
|
|
|
|
|
2015-07-31 20:20:43 +02:00
|
|
|
/*
|
|
|
|
* Does the lwlock in its current state need to wait for the variable value to
|
|
|
|
* change?
|
|
|
|
*
|
|
|
|
* If we don't need to wait, and it's because the value of the variable has
|
|
|
|
* changed, store the current value in newval.
|
|
|
|
*
|
|
|
|
* *result is set to true if the lock was free, and false otherwise.
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
LWLockConflictsWithVar(LWLock *lock,
|
|
|
|
uint64 *valptr, uint64 oldval, uint64 *newval,
|
|
|
|
bool *result)
|
|
|
|
{
|
|
|
|
bool mustwait;
|
|
|
|
uint64 value;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Test first to see if it the slot is free right now.
|
|
|
|
*
|
|
|
|
* XXX: the caller uses a spinlock before this, so we don't need a memory
|
|
|
|
* barrier here as far as the current usage is concerned. But that might
|
|
|
|
* not be safe in general.
|
|
|
|
*/
|
|
|
|
mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
|
|
|
|
|
|
|
|
if (!mustwait)
|
|
|
|
{
|
|
|
|
*result = true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
*result = false;
|
|
|
|
|
|
|
|
/*
|
2016-04-11 05:12:32 +02:00
|
|
|
* Read value using the lwlock's wait list lock, as we can't generally
|
|
|
|
* rely on atomic 64 bit reads/stores. TODO: On platforms with a way to
|
|
|
|
* do atomic 64 bit reads/writes the spinlock should be optimized away.
|
2015-07-31 20:20:43 +02:00
|
|
|
*/
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListLock(lock);
|
2015-07-31 20:20:43 +02:00
|
|
|
value = *valptr;
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListUnlock(lock);
|
2015-07-31 20:20:43 +02:00
|
|
|
|
|
|
|
if (value != oldval)
|
|
|
|
{
|
|
|
|
mustwait = false;
|
|
|
|
*newval = value;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
mustwait = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return mustwait;
|
|
|
|
}
|
|
|
|
|
2014-03-21 15:06:08 +01:00
|
|
|
/*
|
|
|
|
* LWLockWaitForVar - Wait until lock is free, or a variable is updated.
|
|
|
|
*
|
|
|
|
* If the lock is held and *valptr equals oldval, waits until the lock is
|
|
|
|
* either freed, or the lock holder updates *valptr by calling
|
|
|
|
* LWLockUpdateVar. If the lock is free on exit (immediately or after
|
|
|
|
* waiting), returns true. If the lock is still held, but *valptr no longer
|
|
|
|
* matches oldval, returns false and sets *newval to the current value in
|
|
|
|
* *valptr.
|
|
|
|
*
|
|
|
|
* Note: this function ignores shared lock holders; if the lock is held
|
|
|
|
* in shared mode, returns 'true'.
|
|
|
|
*/
|
|
|
|
bool
|
2014-09-22 22:42:14 +02:00
|
|
|
LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
|
2014-03-21 15:06:08 +01:00
|
|
|
{
|
|
|
|
PGPROC *proc = MyProc;
|
|
|
|
int extraWaits = 0;
|
|
|
|
bool result = false;
|
2014-03-21 23:11:24 +01:00
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
lwlock_stats *lwstats;
|
|
|
|
|
2014-09-22 22:42:14 +02:00
|
|
|
lwstats = get_lwlock_stats_entry(lock);
|
2014-12-25 17:24:30 +01:00
|
|
|
#endif
|
|
|
|
|
|
|
|
PRINT_LWDEBUG("LWLockWaitForVar", lock, LW_WAIT_UNTIL_FREE);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Lock out cancel/die interrupts while we sleep on the lock. There is no
|
|
|
|
* cleanup mechanism to remove us from the wait queue if we got
|
|
|
|
* interrupted.
|
|
|
|
*/
|
|
|
|
HOLD_INTERRUPTS();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Loop here to check the lock's status after each time we are signaled.
|
|
|
|
*/
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
bool mustwait;
|
|
|
|
|
2015-07-31 20:20:43 +02:00
|
|
|
mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
|
|
|
|
&result);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
if (!mustwait)
|
|
|
|
break; /* the lock was free or value didn't match */
|
|
|
|
|
|
|
|
/*
|
2014-12-25 17:24:30 +01:00
|
|
|
* Add myself to wait queue. Note that this is racy, somebody else
|
|
|
|
* could wakeup before we're finished queuing. NB: We're using nearly
|
|
|
|
* the same twice-in-a-row lock acquisition protocol as
|
2015-07-31 20:20:43 +02:00
|
|
|
* LWLockAcquire(). Check its comments for details. The only
|
|
|
|
* difference is that we also have to check the variable's values when
|
|
|
|
* checking the state of the lock.
|
2014-12-25 17:24:30 +01:00
|
|
|
*/
|
|
|
|
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set RELEASE_OK flag, to make sure we get woken up as soon as the
|
|
|
|
* lock is released.
|
2014-03-21 15:06:08 +01:00
|
|
|
*/
|
2014-12-25 17:24:30 +01:00
|
|
|
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
2014-10-14 08:55:26 +02:00
|
|
|
/*
|
2015-07-31 20:20:43 +02:00
|
|
|
* We're now guaranteed to be woken up if necessary. Recheck the lock
|
|
|
|
* and variables state.
|
2014-10-14 08:55:26 +02:00
|
|
|
*/
|
2015-07-31 20:20:43 +02:00
|
|
|
mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
|
|
|
|
&result);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2015-07-31 20:20:43 +02:00
|
|
|
/* Ok, no conflict after we queued ourselves. Undo queueing. */
|
2014-12-25 17:24:30 +01:00
|
|
|
if (!mustwait)
|
|
|
|
{
|
|
|
|
LOG_LWDEBUG("LWLockWaitForVar", lock, "free, undoing queue");
|
2014-10-14 08:55:26 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
LWLockDequeueSelf(lock);
|
|
|
|
break;
|
|
|
|
}
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Wait until awakened.
|
|
|
|
*
|
2021-07-09 07:51:48 +02:00
|
|
|
* It is possible that we get awakened for a reason other than being
|
|
|
|
* signaled by LWLockRelease. If so, loop back and wait again. Once
|
|
|
|
* we've gotten the LWLock, re-increment the sema by the number of
|
|
|
|
* additional signals received.
|
2014-03-21 15:06:08 +01:00
|
|
|
*/
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG("LWLockWaitForVar", lock, "waiting");
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
#ifdef LWLOCK_STATS
|
|
|
|
lwstats->block_count++;
|
|
|
|
#endif
|
|
|
|
|
2016-03-10 18:44:09 +01:00
|
|
|
LWLockReportWaitStart(lock);
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
for (;;)
|
|
|
|
{
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreLock(proc->sem);
|
2022-11-20 20:56:32 +01:00
|
|
|
if (proc->lwWaiting == LW_WS_NOT_WAITING)
|
2014-03-21 15:06:08 +01:00
|
|
|
break;
|
|
|
|
extraWaits++;
|
|
|
|
}
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
#ifdef LOCK_DEBUG
|
|
|
|
{
|
|
|
|
/* not waiting anymore */
|
2015-03-26 17:00:30 +01:00
|
|
|
uint32 nwaiters PG_USED_FOR_ASSERTS_ONLY = pg_atomic_fetch_sub_u32(&lock->nwaiters, 1);
|
2015-05-24 03:35:49 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
Assert(nwaiters < MAX_BACKENDS);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
|
2016-03-10 18:44:09 +01:00
|
|
|
LWLockReportWaitEnd();
|
2014-03-21 15:06:08 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
LOG_LWDEBUG("LWLockWaitForVar", lock, "awakened");
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
/* Now loop back and check the status of the lock again. */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fix the process wait semaphore's count for any absorbed wakeups.
|
|
|
|
*/
|
|
|
|
while (extraWaits-- > 0)
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreUnlock(proc->sem);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Now okay to allow cancel/die interrupts.
|
|
|
|
*/
|
|
|
|
RESUME_INTERRUPTS();
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* LWLockUpdateVar - Update a variable and wake up waiters atomically
|
|
|
|
*
|
|
|
|
* Sets *valptr to 'val', and wakes up all processes waiting for us with
|
|
|
|
* LWLockWaitForVar(). Setting the value and waking up the processes happen
|
|
|
|
* atomically so that any process calling LWLockWaitForVar() on the same lock
|
|
|
|
* is guaranteed to see the new value, and act accordingly.
|
|
|
|
*
|
|
|
|
* The caller must be holding the lock in exclusive mode.
|
|
|
|
*/
|
|
|
|
void
|
2014-09-22 22:42:14 +02:00
|
|
|
LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
|
2014-03-21 15:06:08 +01:00
|
|
|
{
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_head wakeup;
|
|
|
|
proclist_mutable_iter iter;
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_init(&wakeup);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListLock(lock);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
Assert(pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
/* Update the lock's value */
|
2014-09-22 22:42:14 +02:00
|
|
|
*valptr = val;
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken
|
|
|
|
* up. They are always in the front of the queue.
|
|
|
|
*/
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_foreach_modify(iter, &lock->waiters, lwWaitLink)
|
2014-03-21 15:06:08 +01:00
|
|
|
{
|
2016-08-16 00:09:55 +02:00
|
|
|
PGPROC *waiter = GetPGProcByNumber(iter.cur);
|
2014-12-25 17:24:30 +01:00
|
|
|
|
|
|
|
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
|
|
|
|
break;
|
2014-03-21 15:06:08 +01:00
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_delete(&lock->waiters, iter.cur, lwWaitLink);
|
|
|
|
proclist_push_tail(&wakeup, iter.cur, lwWaitLink);
|
2022-11-20 20:56:32 +01:00
|
|
|
|
|
|
|
/* see LWLockWakeup() */
|
|
|
|
Assert(waiter->lwWaiting == LW_WS_WAITING);
|
|
|
|
waiter->lwWaiting = LW_WS_PENDING_WAKEUP;
|
2014-03-21 15:06:08 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* We are done updating shared state of the lock itself. */
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListUnlock(lock);
|
2014-03-21 15:06:08 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Awaken any waiters I removed from the queue.
|
|
|
|
*/
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_foreach_modify(iter, &wakeup, lwWaitLink)
|
2014-03-21 15:06:08 +01:00
|
|
|
{
|
2016-08-16 00:09:55 +02:00
|
|
|
PGPROC *waiter = GetPGProcByNumber(iter.cur);
|
2015-05-24 03:35:49 +02:00
|
|
|
|
2016-08-16 00:09:55 +02:00
|
|
|
proclist_delete(&wakeup, iter.cur, lwWaitLink);
|
2014-12-25 17:24:30 +01:00
|
|
|
/* check comment in LWLockWakeup() about this barrier */
|
2014-12-19 14:29:52 +01:00
|
|
|
pg_write_barrier();
|
2022-11-20 20:56:32 +01:00
|
|
|
waiter->lwWaiting = LW_WS_NOT_WAITING;
|
Make the different Unix-y semaphore implementations ABI-compatible.
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
2016-12-12 19:32:10 +01:00
|
|
|
PGSemaphoreUnlock(waiter->sem);
|
2014-03-21 15:06:08 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
/*
|
|
|
|
* LWLockRelease - release a previously acquired lock
|
|
|
|
*/
|
|
|
|
void
|
2014-09-22 22:42:14 +02:00
|
|
|
LWLockRelease(LWLock *lock)
|
2001-09-29 06:02:27 +02:00
|
|
|
{
|
2014-12-25 17:24:30 +01:00
|
|
|
LWLockMode mode;
|
|
|
|
uint32 oldstate;
|
|
|
|
bool check_waiters;
|
2001-09-29 06:02:27 +02:00
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove lock from list of locks held. Usually, but not always, it will
|
|
|
|
* be the latest-acquired lock; so search array backwards.
|
|
|
|
*/
|
|
|
|
for (i = num_held_lwlocks; --i >= 0;)
|
2014-12-25 17:24:30 +01:00
|
|
|
if (lock == held_lwlocks[i].lock)
|
2001-09-29 06:02:27 +02:00
|
|
|
break;
|
2016-12-07 05:02:38 +01:00
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
if (i < 0)
|
2016-12-16 17:29:23 +01:00
|
|
|
elog(ERROR, "lock %s is not held", T_NAME(lock));
|
2016-12-07 05:02:38 +01:00
|
|
|
|
|
|
|
mode = held_lwlocks[i].mode;
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
num_held_lwlocks--;
|
|
|
|
for (; i < num_held_lwlocks; i++)
|
|
|
|
held_lwlocks[i] = held_lwlocks[i + 1];
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
PRINT_LWDEBUG("LWLockRelease", lock, mode);
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
/*
|
2014-12-25 17:24:30 +01:00
|
|
|
* Release my hold on lock, after that it can immediately be acquired by
|
|
|
|
* others, even if we still have to wakeup other waiters.
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
2014-12-25 17:24:30 +01:00
|
|
|
if (mode == LW_EXCLUSIVE)
|
|
|
|
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_EXCLUSIVE);
|
|
|
|
else
|
|
|
|
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_SHARED);
|
Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.
This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.
Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 15:40:58 +01:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/* nobody else can have that kind of lock */
|
|
|
|
Assert(!(oldstate & LW_VAL_EXCLUSIVE));
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2021-05-03 12:11:33 +02:00
|
|
|
if (TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED())
|
|
|
|
TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
|
2001-09-29 06:02:27 +02:00
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
/*
|
|
|
|
* We're still waiting for backends to get scheduled, don't wake them up
|
|
|
|
* again.
|
|
|
|
*/
|
|
|
|
if ((oldstate & (LW_FLAG_HAS_WAITERS | LW_FLAG_RELEASE_OK)) ==
|
|
|
|
(LW_FLAG_HAS_WAITERS | LW_FLAG_RELEASE_OK) &&
|
|
|
|
(oldstate & LW_LOCK_MASK) == 0)
|
|
|
|
check_waiters = true;
|
|
|
|
else
|
|
|
|
check_waiters = false;
|
2006-07-24 18:32:45 +02:00
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
/*
|
2014-12-25 17:24:30 +01:00
|
|
|
* As waking up waiters requires the spinlock to be acquired, only do so
|
|
|
|
* if necessary.
|
2001-09-29 06:02:27 +02:00
|
|
|
*/
|
2014-12-25 17:24:30 +01:00
|
|
|
if (check_waiters)
|
2001-09-29 06:02:27 +02:00
|
|
|
{
|
2014-12-25 17:24:30 +01:00
|
|
|
/* XXX: remove before commit? */
|
|
|
|
LOG_LWDEBUG("LWLockRelease", lock, "releasing waiters");
|
|
|
|
LWLockWakeup(lock);
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now okay to allow cancel/die interrupts.
|
|
|
|
*/
|
|
|
|
RESUME_INTERRUPTS();
|
|
|
|
}
|
|
|
|
|
2015-07-31 20:20:43 +02:00
|
|
|
/*
|
|
|
|
* LWLockReleaseClearVar - release a previously acquired lock, reset variable
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val)
|
|
|
|
{
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListLock(lock);
|
2015-07-31 20:20:43 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the variable's value before releasing the lock, that prevents race
|
|
|
|
* a race condition wherein a new locker acquires the lock, but hasn't yet
|
|
|
|
* set the variables value.
|
|
|
|
*/
|
|
|
|
*valptr = val;
|
2016-04-11 05:12:32 +02:00
|
|
|
LWLockWaitListUnlock(lock);
|
2015-07-31 20:20:43 +02:00
|
|
|
|
|
|
|
LWLockRelease(lock);
|
|
|
|
}
|
|
|
|
|
2001-09-29 06:02:27 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* LWLockReleaseAll - release all currently-held locks
|
|
|
|
*
|
2003-07-25 00:04:15 +02:00
|
|
|
* Used to clean up after ereport(ERROR). An important difference between this
|
2001-09-29 06:02:27 +02:00
|
|
|
* function and retail LWLockRelease calls is that InterruptHoldoffCount is
|
|
|
|
* unchanged by this operation. This is necessary since InterruptHoldoffCount
|
2003-07-25 00:04:15 +02:00
|
|
|
* has been set to an appropriate level earlier in error recovery. We could
|
2001-09-29 06:02:27 +02:00
|
|
|
* decrement it below zero if we allow it to drop for each released lock!
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
LWLockReleaseAll(void)
|
|
|
|
{
|
|
|
|
while (num_held_lwlocks > 0)
|
|
|
|
{
|
|
|
|
HOLD_INTERRUPTS(); /* match the upcoming RESUME_INTERRUPTS */
|
|
|
|
|
2014-12-25 17:24:30 +01:00
|
|
|
LWLockRelease(held_lwlocks[num_held_lwlocks - 1].lock);
|
2001-09-29 06:02:27 +02:00
|
|
|
}
|
|
|
|
}
|
2004-06-11 18:43:24 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
2016-09-05 11:38:08 +02:00
|
|
|
* LWLockHeldByMe - test whether my process holds a lock in any mode
|
2004-06-11 18:43:24 +02:00
|
|
|
*
|
2016-09-05 11:38:08 +02:00
|
|
|
* This is meant as debug support only.
|
2004-06-11 18:43:24 +02:00
|
|
|
*/
|
|
|
|
bool
|
2022-09-20 04:18:36 +02:00
|
|
|
LWLockHeldByMe(LWLock *lock)
|
2004-06-11 18:43:24 +02:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < num_held_lwlocks; i++)
|
|
|
|
{
|
2022-09-20 04:18:36 +02:00
|
|
|
if (held_lwlocks[i].lock == lock)
|
2004-06-11 18:43:24 +02:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
2016-09-05 11:38:08 +02:00
|
|
|
|
2022-07-11 04:47:16 +02:00
|
|
|
/*
|
|
|
|
* LWLockHeldByMe - test whether my process holds any of an array of locks
|
|
|
|
*
|
|
|
|
* This is meant as debug support only.
|
|
|
|
*/
|
|
|
|
bool
|
2022-09-20 04:18:36 +02:00
|
|
|
LWLockAnyHeldByMe(LWLock *lock, int nlocks, size_t stride)
|
2022-07-11 04:47:16 +02:00
|
|
|
{
|
|
|
|
char *held_lock_addr;
|
|
|
|
char *begin;
|
|
|
|
char *end;
|
|
|
|
int i;
|
|
|
|
|
2022-09-20 04:18:36 +02:00
|
|
|
begin = (char *) lock;
|
2022-07-11 04:47:16 +02:00
|
|
|
end = begin + nlocks * stride;
|
|
|
|
for (i = 0; i < num_held_lwlocks; i++)
|
|
|
|
{
|
|
|
|
held_lock_addr = (char *) held_lwlocks[i].lock;
|
|
|
|
if (held_lock_addr >= begin &&
|
|
|
|
held_lock_addr < end &&
|
|
|
|
(held_lock_addr - begin) % stride == 0)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-09-05 11:38:08 +02:00
|
|
|
/*
|
|
|
|
* LWLockHeldByMeInMode - test whether my process holds a lock in given mode
|
|
|
|
*
|
|
|
|
* This is meant as debug support only.
|
|
|
|
*/
|
|
|
|
bool
|
2022-09-20 04:18:36 +02:00
|
|
|
LWLockHeldByMeInMode(LWLock *lock, LWLockMode mode)
|
2016-09-05 11:38:08 +02:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < num_held_lwlocks; i++)
|
|
|
|
{
|
2022-09-20 04:18:36 +02:00
|
|
|
if (held_lwlocks[i].lock == lock && held_lwlocks[i].mode == mode)
|
2016-09-05 11:38:08 +02:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|