postgresql/src/backend/utils/adt/lockfuncs.c

1048 lines
27 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
2002-08-17 15:11:43 +02:00
* lockfuncs.c
* Functions for SQL access to various lock-manager capabilities.
2002-09-04 22:31:48 +02:00
*
* Copyright (c) 2002-2020, PostgreSQL Global Development Group
2002-08-17 15:11:43 +02:00
*
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/backend/utils/adt/lockfuncs.c
*
*-------------------------------------------------------------------------
2002-08-17 15:11:43 +02:00
*/
#include "postgres.h"
#include "access/htup_details.h"
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "funcapi.h"
#include "miscadmin.h"
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
#include "storage/predicate_internals.h"
Create a function to reliably identify which sessions block which others. This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdcac4982215beb9f65a19debecaf22fd470 lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.
2016-02-22 20:31:43 +01:00
#include "utils/array.h"
#include "utils/builtins.h"
2002-08-17 15:11:43 +02:00
/* This must match enum LockTagType! */
const char *const LockTagTypeNames[] = {
"relation",
"extend",
"page",
"tuple",
"transactionid",
"virtualxid",
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:31:36 +02:00
"speculative token",
"object",
"userlock",
"advisory"
};
StaticAssertDecl(lengthof(LockTagTypeNames) == (LOCKTAG_ADVISORY + 1),
"array length mismatch");
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
/* This must match enum PredicateLockTargetType (predicate_internals.h) */
static const char *const PredicateLockTagTypeNames[] = {
"relation",
"page",
"tuple"
};
StaticAssertDecl(lengthof(PredicateLockTagTypeNames) == (PREDLOCKTAG_TUPLE + 1),
"array length mismatch");
/* Working status for pg_lock_status */
typedef struct
{
LockData *lockData; /* state data from lmgr */
int currIdx; /* current PROCLOCK index */
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
PredicateLockData *predLockData; /* state data for pred locks */
int predLockIdx; /* current index for pred lock */
} PG_Lock_Status;
/* Number of columns in pg_locks output */
#define NUM_LOCK_STATUS_COLUMNS 15
/*
* VXIDGetDatum - Construct a text representation of a VXID
*
* This is currently only used in pg_lock_status, so we put it here.
*/
static Datum
VXIDGetDatum(BackendId bid, LocalTransactionId lxid)
{
/*
* The representation is "<bid>/<lxid>", decimal and unsigned decimal
* respectively. Note that elog.c also knows how to format a vxid.
*/
2007-11-15 22:14:46 +01:00
char vxidstr[32];
snprintf(vxidstr, sizeof(vxidstr), "%d/%u", bid, lxid);
return CStringGetTextDatum(vxidstr);
}
/*
* pg_lock_status - produce a view with one row per held or awaited lock mode
*/
2002-08-17 15:11:43 +02:00
Datum
pg_lock_status(PG_FUNCTION_ARGS)
2002-08-17 15:11:43 +02:00
{
2002-09-04 22:31:48 +02:00
FuncCallContext *funcctx;
PG_Lock_Status *mystatus;
LockData *lockData;
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
PredicateLockData *predLockData;
2002-08-17 15:11:43 +02:00
if (SRF_IS_FIRSTCALL())
{
2002-09-04 22:31:48 +02:00
TupleDesc tupdesc;
MemoryContext oldcontext;
2002-08-17 15:11:43 +02:00
/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();
2002-09-04 22:31:48 +02:00
/*
2005-10-15 04:49:52 +02:00
* switch to memory context appropriate for multiple function calls
2002-09-04 22:31:48 +02:00
*/
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
/* build tupdesc for result tuples */
Create a function to reliably identify which sessions block which others. This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdcac4982215beb9f65a19debecaf22fd470 lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.
2016-02-22 20:31:43 +01:00
/* this had better match function's declaration in pg_proc.h */
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
tupdesc = CreateTemplateTupleDesc(NUM_LOCK_STATUS_COLUMNS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "locktype",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "database",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 3, "relation",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 4, "page",
INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 5, "tuple",
INT2OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 6, "virtualxid",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 7, "transactionid",
XIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 8, "classid",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 9, "objid",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "objsubid",
INT2OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 11, "virtualtransaction",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 12, "pid",
INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 13, "mode",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 14, "granted",
BOOLOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
BOOLOID, -1, 0);
funcctx->tuple_desc = BlessTupleDesc(tupdesc);
2002-08-17 15:11:43 +02:00
/*
2005-10-15 04:49:52 +02:00
* Collect all the locking information that we will format and send
* out as a result set.
2002-08-17 15:11:43 +02:00
*/
mystatus = (PG_Lock_Status *) palloc(sizeof(PG_Lock_Status));
funcctx->user_fctx = (void *) mystatus;
2002-08-17 15:11:43 +02:00
mystatus->lockData = GetLockStatusData();
mystatus->currIdx = 0;
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
mystatus->predLockData = GetPredicateLockStatusData();
mystatus->predLockIdx = 0;
2002-08-17 15:11:43 +02:00
MemoryContextSwitchTo(oldcontext);
2002-08-17 15:11:43 +02:00
}
2002-09-04 22:31:48 +02:00
funcctx = SRF_PERCALL_SETUP();
mystatus = (PG_Lock_Status *) funcctx->user_fctx;
lockData = mystatus->lockData;
2002-08-17 15:11:43 +02:00
while (mystatus->currIdx < lockData->nelements)
2002-08-17 15:11:43 +02:00
{
2002-09-04 22:31:48 +02:00
bool granted;
LOCKMODE mode = 0;
const char *locktypename;
char tnbuf[32];
Datum values[NUM_LOCK_STATUS_COLUMNS];
bool nulls[NUM_LOCK_STATUS_COLUMNS];
2002-09-04 22:31:48 +02:00
HeapTuple tuple;
Datum result;
LockInstanceData *instance;
2002-09-04 22:31:48 +02:00
instance = &(lockData->locks[mystatus->currIdx]);
2002-08-17 15:11:43 +02:00
/*
2005-10-15 04:49:52 +02:00
* Look to see if there are any held lock modes in this PROCLOCK. If
* so, report, and destructively modify lockData so we don't report
* again.
2002-08-17 15:11:43 +02:00
*/
granted = false;
if (instance->holdMask)
2002-08-17 15:11:43 +02:00
{
for (mode = 0; mode < MAX_LOCKMODES; mode++)
{
if (instance->holdMask & LOCKBIT_ON(mode))
{
granted = true;
instance->holdMask &= LOCKBIT_OFF(mode);
break;
}
}
2002-08-17 15:11:43 +02:00
}
/*
2002-09-04 22:31:48 +02:00
* If no (more) held modes to report, see if PROC is waiting for a
* lock on this lock.
*/
if (!granted)
2002-08-17 15:11:43 +02:00
{
if (instance->waitLockMode != NoLock)
{
/* Yes, so report it with proper mode */
mode = instance->waitLockMode;
2002-09-04 22:31:48 +02:00
/*
2005-10-15 04:49:52 +02:00
* We are now done with this PROCLOCK, so advance pointer to
* continue with next one on next call.
*/
mystatus->currIdx++;
}
else
{
/*
2005-10-15 04:49:52 +02:00
* Okay, we've displayed all the locks associated with this
* PROCLOCK, proceed to the next one.
*/
mystatus->currIdx++;
continue;
}
}
2002-08-17 15:11:43 +02:00
/*
* Form tuple with appropriate data.
*/
MemSet(values, 0, sizeof(values));
MemSet(nulls, false, sizeof(nulls));
if (instance->locktag.locktag_type <= LOCKTAG_LAST_TYPE)
locktypename = LockTagTypeNames[instance->locktag.locktag_type];
else
{
snprintf(tnbuf, sizeof(tnbuf), "unknown %d",
(int) instance->locktag.locktag_type);
locktypename = tnbuf;
}
values[0] = CStringGetTextDatum(locktypename);
switch ((LockTagType) instance->locktag.locktag_type)
{
case LOCKTAG_RELATION:
case LOCKTAG_RELATION_EXTEND:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);
nulls[3] = true;
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_PAGE:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);
values[3] = UInt32GetDatum(instance->locktag.locktag_field3);
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_TUPLE:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);
values[3] = UInt32GetDatum(instance->locktag.locktag_field3);
values[4] = UInt16GetDatum(instance->locktag.locktag_field4);
nulls[5] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_TRANSACTION:
values[6] =
TransactionIdGetDatum(instance->locktag.locktag_field1);
nulls[1] = true;
nulls[2] = true;
nulls[3] = true;
nulls[4] = true;
nulls[5] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_VIRTUALTRANSACTION:
values[5] = VXIDGetDatum(instance->locktag.locktag_field1,
instance->locktag.locktag_field2);
nulls[1] = true;
nulls[2] = true;
nulls[3] = true;
nulls[4] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_OBJECT:
case LOCKTAG_USERLOCK:
case LOCKTAG_ADVISORY:
default: /* treat unknown locktags like OBJECT */
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[7] = ObjectIdGetDatum(instance->locktag.locktag_field2);
values[8] = ObjectIdGetDatum(instance->locktag.locktag_field3);
values[9] = Int16GetDatum(instance->locktag.locktag_field4);
nulls[2] = true;
nulls[3] = true;
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
break;
2002-08-17 15:11:43 +02:00
}
values[10] = VXIDGetDatum(instance->backend, instance->lxid);
if (instance->pid != 0)
values[11] = Int32GetDatum(instance->pid);
else
nulls[11] = true;
values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
values[13] = BoolGetDatum(granted);
values[14] = BoolGetDatum(instance->fastpath);
2002-08-17 15:11:43 +02:00
tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
result = HeapTupleGetDatum(tuple);
SRF_RETURN_NEXT(funcctx, result);
2002-08-17 15:11:43 +02:00
}
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
/*
* Have returned all regular locks. Now start on the SIREAD predicate
* locks.
*/
predLockData = mystatus->predLockData;
if (mystatus->predLockIdx < predLockData->nelements)
{
PredicateLockTargetType lockType;
PREDICATELOCKTARGETTAG *predTag = &(predLockData->locktags[mystatus->predLockIdx]);
SERIALIZABLEXACT *xact = &(predLockData->xacts[mystatus->predLockIdx]);
Datum values[NUM_LOCK_STATUS_COLUMNS];
bool nulls[NUM_LOCK_STATUS_COLUMNS];
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
HeapTuple tuple;
Datum result;
mystatus->predLockIdx++;
/*
* Form tuple with appropriate data.
*/
MemSet(values, 0, sizeof(values));
MemSet(nulls, false, sizeof(nulls));
/* lock type */
lockType = GET_PREDICATELOCKTARGETTAG_TYPE(*predTag);
values[0] = CStringGetTextDatum(PredicateLockTagTypeNames[lockType]);
/* lock target */
values[1] = GET_PREDICATELOCKTARGETTAG_DB(*predTag);
values[2] = GET_PREDICATELOCKTARGETTAG_RELATION(*predTag);
if (lockType == PREDLOCKTAG_TUPLE)
values[4] = GET_PREDICATELOCKTARGETTAG_OFFSET(*predTag);
else
nulls[4] = true;
if ((lockType == PREDLOCKTAG_TUPLE) ||
(lockType == PREDLOCKTAG_PAGE))
values[3] = GET_PREDICATELOCKTARGETTAG_PAGE(*predTag);
else
nulls[3] = true;
/* these fields are targets for other types of locks */
nulls[5] = true; /* virtualxid */
nulls[6] = true; /* transactionid */
nulls[7] = true; /* classid */
nulls[8] = true; /* objid */
nulls[9] = true; /* objsubid */
/* lock holder */
values[10] = VXIDGetDatum(xact->vxid.backendId,
xact->vxid.localTransactionId);
if (xact->pid != 0)
values[11] = Int32GetDatum(xact->pid);
else
nulls[11] = true;
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
/*
* Lock mode. Currently all predicate locks are SIReadLocks, which are
* always held (never waiting) and have no fast path
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
*/
values[12] = CStringGetTextDatum("SIReadLock");
values[13] = BoolGetDatum(true);
values[14] = BoolGetDatum(false);
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
2011-02-07 22:46:51 +01:00
tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
result = HeapTupleGetDatum(tuple);
SRF_RETURN_NEXT(funcctx, result);
}
SRF_RETURN_DONE(funcctx);
2002-08-17 15:11:43 +02:00
}
Create a function to reliably identify which sessions block which others. This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdcac4982215beb9f65a19debecaf22fd470 lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.
2016-02-22 20:31:43 +01:00
/*
* pg_blocking_pids - produce an array of the PIDs blocking given PID
*
* The reported PIDs are those that hold a lock conflicting with blocked_pid's
* current request (hard block), or are requesting such a lock and are ahead
* of blocked_pid in the lock's wait queue (soft block).
*
* In parallel-query cases, we report all PIDs blocking any member of the
* given PID's lock group, and the reported PIDs are those of the blocking
* PIDs' lock group leaders. This allows callers to compare the result to
* lists of clients' pg_backend_pid() results even during a parallel query.
*
* Parallel query makes it possible for there to be duplicate PIDs in the
* result (either because multiple waiters are blocked by same PID, or
* because multiple blockers have same group leader PID). We do not bother
* to eliminate such duplicates from the result.
*
* We need not consider predicate locks here, since those don't block anything.
*/
Datum
pg_blocking_pids(PG_FUNCTION_ARGS)
{
int blocked_pid = PG_GETARG_INT32(0);
Datum *arrayelems;
int narrayelems;
BlockedProcsData *lockData; /* state data from lmgr */
int i,
j;
/* Collect a snapshot of lock manager state */
lockData = GetBlockerStatusData(blocked_pid);
/* We can't need more output entries than there are reported PROCLOCKs */
arrayelems = (Datum *) palloc(lockData->nlocks * sizeof(Datum));
narrayelems = 0;
/* For each blocked proc in the lock group ... */
for (i = 0; i < lockData->nprocs; i++)
{
BlockedProcData *bproc = &lockData->procs[i];
LockInstanceData *instances = &lockData->locks[bproc->first_lock];
int *preceding_waiters = &lockData->waiter_pids[bproc->first_waiter];
LockInstanceData *blocked_instance;
LockMethod lockMethodTable;
int conflictMask;
/*
* Locate the blocked proc's own entry in the LockInstanceData array.
* There should be exactly one matching entry.
*/
blocked_instance = NULL;
for (j = 0; j < bproc->num_locks; j++)
{
LockInstanceData *instance = &(instances[j]);
if (instance->pid == bproc->pid)
{
Assert(blocked_instance == NULL);
blocked_instance = instance;
}
}
Assert(blocked_instance != NULL);
lockMethodTable = GetLockTagsMethodTable(&(blocked_instance->locktag));
conflictMask = lockMethodTable->conflictTab[blocked_instance->waitLockMode];
/* Now scan the PROCLOCK data for conflicting procs */
for (j = 0; j < bproc->num_locks; j++)
{
LockInstanceData *instance = &(instances[j]);
/* A proc never blocks itself, so ignore that entry */
if (instance == blocked_instance)
continue;
/* Members of same lock group never block each other, either */
if (instance->leaderPid == blocked_instance->leaderPid)
continue;
if (conflictMask & instance->holdMask)
{
/* hard block: blocked by lock already held by this entry */
}
else if (instance->waitLockMode != NoLock &&
(conflictMask & LOCKBIT_ON(instance->waitLockMode)))
{
/* conflict in lock requests; who's in front in wait queue? */
bool ahead = false;
int k;
for (k = 0; k < bproc->num_waiters; k++)
{
if (preceding_waiters[k] == instance->pid)
{
/* soft block: this entry is ahead of blocked proc */
ahead = true;
break;
}
}
if (!ahead)
continue; /* not blocked by this entry */
}
else
{
/* not blocked by this entry */
continue;
}
/* blocked by this entry, so emit a record */
arrayelems[narrayelems++] = Int32GetDatum(instance->leaderPid);
}
}
/* Assert we didn't overrun arrayelems[] */
Assert(narrayelems <= lockData->nlocks);
/* Construct array, using hardwired knowledge about int4 type */
PG_RETURN_ARRAYTYPE_P(construct_array(arrayelems, narrayelems,
INT4OID,
sizeof(int32), true, TYPALIGN_INT));
Create a function to reliably identify which sessions block which others. This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdcac4982215beb9f65a19debecaf22fd470 lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.
2016-02-22 20:31:43 +01:00
}
/*
* pg_safe_snapshot_blocking_pids - produce an array of the PIDs blocking
* given PID from getting a safe snapshot
*
* XXX this does not consider parallel-query cases; not clear how big a
* problem that is in practice
*/
Datum
pg_safe_snapshot_blocking_pids(PG_FUNCTION_ARGS)
{
int blocked_pid = PG_GETARG_INT32(0);
int *blockers;
int num_blockers;
Datum *blocker_datums;
/* A buffer big enough for any possible blocker list without truncation */
blockers = (int *) palloc(MaxBackends * sizeof(int));
/* Collect a snapshot of processes waited for by GetSafeSnapshot */
num_blockers =
GetSafeSnapshotBlockingPids(blocked_pid, blockers, MaxBackends);
/* Convert int array to Datum array */
if (num_blockers > 0)
{
int i;
blocker_datums = (Datum *) palloc(num_blockers * sizeof(Datum));
for (i = 0; i < num_blockers; ++i)
blocker_datums[i] = Int32GetDatum(blockers[i]);
}
else
blocker_datums = NULL;
/* Construct array, using hardwired knowledge about int4 type */
PG_RETURN_ARRAYTYPE_P(construct_array(blocker_datums, num_blockers,
INT4OID,
sizeof(int32), true, TYPALIGN_INT));
}
/*
* pg_isolation_test_session_is_blocked - support function for isolationtester
*
* Check if specified PID is blocked by any of the PIDs listed in the second
* argument. Currently, this looks for blocking caused by waiting for
* heavyweight locks or safe snapshots. We ignore blockage caused by PIDs
* not directly under the isolationtester's control, eg autovacuum.
*
* This is an undocumented function intended for use by the isolation tester,
* and may change in future releases as required for testing purposes.
*/
Datum
pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
{
int blocked_pid = PG_GETARG_INT32(0);
ArrayType *interesting_pids_a = PG_GETARG_ARRAYTYPE_P(1);
ArrayType *blocking_pids_a;
int32 *interesting_pids;
int32 *blocking_pids;
int num_interesting_pids;
int num_blocking_pids;
int dummy;
int i,
j;
/* Validate the passed-in array */
Assert(ARR_ELEMTYPE(interesting_pids_a) == INT4OID);
if (array_contains_nulls(interesting_pids_a))
elog(ERROR, "array must not contain nulls");
interesting_pids = (int32 *) ARR_DATA_PTR(interesting_pids_a);
num_interesting_pids = ArrayGetNItems(ARR_NDIM(interesting_pids_a),
ARR_DIMS(interesting_pids_a));
/*
* Get the PIDs of all sessions blocking the given session's attempt to
* acquire heavyweight locks.
*/
blocking_pids_a =
DatumGetArrayTypeP(DirectFunctionCall1(pg_blocking_pids, blocked_pid));
Assert(ARR_ELEMTYPE(blocking_pids_a) == INT4OID);
Assert(!array_contains_nulls(blocking_pids_a));
blocking_pids = (int32 *) ARR_DATA_PTR(blocking_pids_a);
num_blocking_pids = ArrayGetNItems(ARR_NDIM(blocking_pids_a),
ARR_DIMS(blocking_pids_a));
/*
* Check if any of these are in the list of interesting PIDs, that being
* the sessions that the isolation tester is running. We don't use
* "arrayoverlaps" here, because it would lead to cache lookups and one of
* our goals is to run quickly under CLOBBER_CACHE_ALWAYS. We expect
* blocking_pids to be usually empty and otherwise a very small number in
* isolation tester cases, so make that the outer loop of a naive search
* for a match.
*/
for (i = 0; i < num_blocking_pids; i++)
for (j = 0; j < num_interesting_pids; j++)
{
if (blocking_pids[i] == interesting_pids[j])
PG_RETURN_BOOL(true);
}
/*
* Check if blocked_pid is waiting for a safe snapshot. We could in
* theory check the resulting array of blocker PIDs against the
* interesting PIDs whitelist, but since there is no danger of autovacuum
* blocking GetSafeSnapshot there seems to be no point in expending cycles
* on allocating a buffer and searching for overlap; so it's presently
* sufficient for the isolation tester's purposes to use a single element
* buffer and check if the number of safe snapshot blockers is non-zero.
*/
if (GetSafeSnapshotBlockingPids(blocked_pid, &dummy, 1) > 0)
PG_RETURN_BOOL(true);
PG_RETURN_BOOL(false);
}
/*
* Functions for manipulating advisory locks
*
* We make use of the locktag fields as follows:
*
* field1: MyDatabaseId ... ensures locks are local to each database
* field2: first of 2 int4 keys, or high-order half of an int8 key
* field3: second of 2 int4 keys, or low-order half of an int8 key
* field4: 1 if using an int8 key, 2 if using 2 int4 keys
*/
#define SET_LOCKTAG_INT64(tag, key64) \
SET_LOCKTAG_ADVISORY(tag, \
MyDatabaseId, \
(uint32) ((key64) >> 32), \
(uint32) (key64), \
1)
#define SET_LOCKTAG_INT32(tag, key1, key2) \
SET_LOCKTAG_ADVISORY(tag, MyDatabaseId, key1, key2, 2)
/*
* pg_advisory_lock(int8) - acquire exclusive lock on an int8 key
*/
Datum
pg_advisory_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ExclusiveLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock(int8) - acquire xact scoped
* exclusive lock on an int8 key
*/
Datum
pg_advisory_xact_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ExclusiveLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_lock_shared(int8) - acquire share lock on an int8 key
*/
Datum
pg_advisory_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ShareLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock_shared(int8) - acquire xact scoped
* share lock on an int8 key
*/
Datum
pg_advisory_xact_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ShareLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_try_advisory_lock(int8) - acquire exclusive lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ExclusiveLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock(int8) - acquire xact scoped
* exclusive lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ExclusiveLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_lock_shared(int8) - acquire share lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ShareLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock_shared(int8) - acquire xact scoped
* share lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ShareLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
2006-10-04 02:30:14 +02:00
* pg_advisory_unlock(int8) - release exclusive lock on an int8 key
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
bool res;
SET_LOCKTAG_INT64(tag, key);
res = LockRelease(&tag, ExclusiveLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_unlock_shared(int8) - release share lock on an int8 key
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
bool res;
SET_LOCKTAG_INT64(tag, key);
res = LockRelease(&tag, ShareLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_lock(int4, int4) - acquire exclusive lock on 2 int4 keys
*/
Datum
pg_advisory_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ExclusiveLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock(int4, int4) - acquire xact scoped
* exclusive lock on 2 int4 keys
*/
Datum
pg_advisory_xact_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ExclusiveLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_lock_shared(int4, int4) - acquire share lock on 2 int4 keys
*/
Datum
pg_advisory_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ShareLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock_shared(int4, int4) - acquire xact scoped
* share lock on 2 int4 keys
*/
Datum
pg_advisory_xact_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ShareLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_try_advisory_lock(int4, int4) - acquire exclusive lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ExclusiveLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock(int4, int4) - acquire xact scoped
* exclusive lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ExclusiveLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_lock_shared(int4, int4) - acquire share lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ShareLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock_shared(int4, int4) - acquire xact scoped
* share lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ShareLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
2006-10-04 02:30:14 +02:00
* pg_advisory_unlock(int4, int4) - release exclusive lock on 2 int4 keys
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
bool res;
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockRelease(&tag, ExclusiveLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_unlock_shared(int4, int4) - release share lock on 2 int4 keys
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
bool res;
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockRelease(&tag, ShareLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_unlock_all() - release all advisory locks
*/
Datum
pg_advisory_unlock_all(PG_FUNCTION_ARGS)
{
LockReleaseSession(USER_LOCKMETHOD);
PG_RETURN_VOID();
}