Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
Locking tuples
|
|
|
|
--------------
|
|
|
|
|
|
|
|
Locking tuples is not as easy as locking tables or other database objects.
|
|
|
|
The problem is that transactions might want to lock large numbers of tuples at
|
|
|
|
any one time, so it's not possible to keep the locks objects in shared memory.
|
|
|
|
To work around this limitation, we use a two-level mechanism. The first level
|
|
|
|
is implemented by storing locking information in the tuple header: a tuple is
|
|
|
|
marked as locked by setting the current transaction's XID as its XMAX, and
|
|
|
|
setting additional infomask bits to distinguish this case from the more normal
|
|
|
|
case of having deleted the tuple. When multiple transactions concurrently
|
2015-05-20 18:44:46 +02:00
|
|
|
lock a tuple, a MultiXact is used; see below. This mechanism can accommodate
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
arbitrarily large numbers of tuples being locked simultaneously.
|
|
|
|
|
|
|
|
When it is necessary to wait for a tuple-level lock to be released, the basic
|
|
|
|
delay is provided by XactLockTableWait or MultiXactIdWait on the contents of
|
|
|
|
the tuple's XMAX. However, that mechanism will release all waiters
|
|
|
|
concurrently, so there would be a race condition as to which waiter gets the
|
|
|
|
tuple, potentially leading to indefinite starvation of some waiters. The
|
|
|
|
possibility of share-locking makes the problem much worse --- a steady stream
|
|
|
|
of share-lockers can easily block an exclusive locker forever. To provide
|
|
|
|
more reliable semantics about who gets a tuple-level lock first, we use the
|
|
|
|
standard lock manager, which implements the second level mentioned above. The
|
|
|
|
protocol for waiting for a tuple-level lock is really
|
|
|
|
|
|
|
|
LockTuple()
|
|
|
|
XactLockTableWait()
|
|
|
|
mark tuple as locked by me
|
|
|
|
UnlockTuple()
|
|
|
|
|
|
|
|
When there are multiple waiters, arbitration of who is to get the lock next
|
|
|
|
is provided by LockTuple(). However, at most one tuple-level lock will
|
|
|
|
be held or awaited per backend at any time, so we don't risk overflow
|
|
|
|
of the lock table. Note that incoming share-lockers are required to
|
|
|
|
do LockTuple as well, if there is any conflict, to ensure that they don't
|
|
|
|
starve out waiting exclusive-lockers. However, if there is not any active
|
|
|
|
conflict for a tuple, we don't incur any extra overhead.
|
|
|
|
|
Avoid spurious deadlocks when upgrading a tuple lock
This puts back reverted commit de87a084c0a5, with some bug fixes.
When two (or more) transactions are waiting for transaction T1 to release a
tuple-level lock, and transaction T1 upgrades its lock to a higher level, a
spurious deadlock can be reported among the waiting transactions when T1
finishes. The simplest example case seems to be:
T1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for T1
Z: select id from job where name = 'a' for key share;
T1: update job set name = 'b' where id = 1;
Z: update job set name = 'c' where id = 1; -- starts waiting for T1
T1: rollback;
At this point, transaction Y is rolled back on account of a deadlock: Y
holds the heavyweight tuple lock and is waiting for the Xmax to be released,
while Z holds part of the multixact and tries to acquire the heavyweight
lock (per protocol) and goes to sleep; once T1 releases its part of the
multixact, Z is awakened only to be put back to sleep on the heavyweight
lock that Y is holding while sleeping. Kaboom.
This can be avoided by having Z skip the heavyweight lock acquisition. As
far as I can see, the biggest downside is that if there are multiple Z
transactions, the order in which they resume after T1 finishes is not
guaranteed.
Backpatch to 9.6. The patch applies cleanly on 9.5, but the new tests don't
work there (because isolationtester is not smart enough), so I'm not going
to risk it.
Author: Oleksii Kliukin
Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com
Discussion: https://postgr.es/m/2815.1560521451@sss.pgh.pa.us
2019-06-19 00:23:16 +02:00
|
|
|
We make an exception to the above rule for those lockers that already hold
|
|
|
|
some lock on a tuple and attempt to acquire a stronger one on it. In that
|
|
|
|
case, we skip the LockTuple() call even when there are conflicts, provided
|
|
|
|
that the target tuple is being locked, updated or deleted by multiple sessions
|
|
|
|
concurrently. Failing to skip the lock would risk a deadlock, e.g., between a
|
|
|
|
session that was first to record its weaker lock in the tuple header and would
|
|
|
|
be waiting on the LockTuple() call to upgrade to the stronger lock level, and
|
|
|
|
another session that has already done LockTuple() and is waiting for the first
|
|
|
|
session transaction to release its tuple header-level lock.
|
|
|
|
|
2014-10-24 01:51:58 +02:00
|
|
|
We provide four levels of tuple locking strength: SELECT FOR UPDATE obtains an
|
|
|
|
exclusive lock which prevents any kind of modification of the tuple. This is
|
|
|
|
the lock level that is implicitly taken by DELETE operations, and also by
|
|
|
|
UPDATE operations if they modify any of the tuple's key fields. SELECT FOR NO
|
|
|
|
KEY UPDATE likewise obtains an exclusive lock, but only prevents tuple removal
|
|
|
|
and modifications which might alter the tuple's key. This is the lock that is
|
|
|
|
implicitly taken by UPDATE operations which leave all key fields unchanged.
|
|
|
|
SELECT FOR SHARE obtains a shared lock which prevents any kind of tuple
|
|
|
|
modification. Finally, SELECT FOR KEY SHARE obtains a shared lock which only
|
2018-07-27 16:56:30 +02:00
|
|
|
prevents tuple removal and modifications of key fields. This lock level is
|
|
|
|
just strong enough to implement RI checks, i.e. it ensures that tuples do not
|
|
|
|
go away from under a check, without blocking transactions that want to update
|
|
|
|
the tuple without changing its key.
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
|
|
|
|
The conflict table is:
|
|
|
|
|
2014-10-24 01:51:58 +02:00
|
|
|
UPDATE NO KEY UPDATE SHARE KEY SHARE
|
|
|
|
UPDATE conflict conflict conflict conflict
|
|
|
|
NO KEY UPDATE conflict conflict conflict
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
SHARE conflict conflict
|
|
|
|
KEY SHARE conflict
|
|
|
|
|
|
|
|
When there is a single locker in a tuple, we can just store the locking info
|
|
|
|
in the tuple itself. We do this by storing the locker's Xid in XMAX, and
|
|
|
|
setting infomask bits specifying the locking strength. There is one exception
|
|
|
|
here: since infomask space is limited, we do not provide a separate bit
|
|
|
|
for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
|
|
|
|
that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
|
|
|
|
presumably more commonly used due to being the standards-mandated locking
|
|
|
|
mechanism, or heavily used by the RI code, so we want to provide fast paths
|
|
|
|
for those.)
|
|
|
|
|
|
|
|
MultiXacts
|
|
|
|
----------
|
|
|
|
|
|
|
|
A tuple header provides very limited space for storing information about tuple
|
|
|
|
locking and updates: there is room only for a single Xid and a small number of
|
|
|
|
infomask bits. Whenever we need to store more than one lock, we replace the
|
|
|
|
first locker's Xid with a new MultiXactId. Each MultiXact provides extended
|
|
|
|
locking data; it comprises an array of Xids plus some flags bits for each one.
|
|
|
|
The flags are currently used to store the locking strength of each member
|
|
|
|
transaction. (The flags also distinguish a pure locker from an updater.)
|
|
|
|
|
|
|
|
In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
|
|
|
|
locked in shared mode by multiple transactions. This is no longer the case; a
|
|
|
|
MultiXact may contain an update or delete Xid. (Keep in mind that tuple locks
|
|
|
|
in a transaction do not conflict with other tuple locks in the same
|
|
|
|
transaction, so it's possible to have otherwise conflicting locks in a
|
|
|
|
MultiXact if they belong to the same transaction).
|
|
|
|
|
|
|
|
Note that each lock is attributed to the subtransaction that acquires it.
|
|
|
|
This means that a subtransaction that aborts is seen as though it releases the
|
|
|
|
locks it acquired; concurrent transactions can then proceed without having to
|
|
|
|
wait for the main transaction to finish. It also means that a subtransaction
|
|
|
|
can upgrade to a stronger lock level than an earlier transaction had, and if
|
|
|
|
the subxact aborts, the earlier, weaker lock is kept.
|
|
|
|
|
|
|
|
The possibility of having an update within a MultiXact means that they must
|
|
|
|
persist across crashes and restarts: a future reader of the tuple needs to
|
|
|
|
figure out whether the update committed or aborted. So we have a requirement
|
|
|
|
that pg_multixact needs to retain pages of its data until we're certain that
|
|
|
|
the MultiXacts in them are no longer of interest.
|
|
|
|
|
|
|
|
VACUUM is in charge of removing old MultiXacts at the time of tuple freezing.
|
2015-05-25 20:09:05 +02:00
|
|
|
The lower bound used by vacuum (that is, the value below which all multixacts
|
|
|
|
are removed) is stored as pg_class.relminmxid for each table; the minimum of
|
|
|
|
all such values is stored in pg_database.datminmxid. The minimum across
|
|
|
|
all databases, in turn, is recorded in checkpoint records, and CHECKPOINT
|
|
|
|
removes pg_multixact/ segments older than that value once the checkpoint
|
|
|
|
record has been flushed.
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
|
|
|
|
Infomask Bits
|
|
|
|
-------------
|
|
|
|
|
|
|
|
The following infomask bits are applicable:
|
|
|
|
|
|
|
|
- HEAP_XMAX_INVALID
|
|
|
|
Any tuple with this bit set does not have a valid value stored in XMAX.
|
|
|
|
|
|
|
|
- HEAP_XMAX_IS_MULTI
|
|
|
|
This bit is set if the tuple's Xmax is a MultiXactId (as opposed to a
|
|
|
|
regular TransactionId).
|
|
|
|
|
|
|
|
- HEAP_XMAX_LOCK_ONLY
|
|
|
|
This bit is set when the XMAX is a locker only; that is, if it's a
|
|
|
|
multixact, it does not contain an update among its members. It's set when
|
|
|
|
the XMAX is a plain Xid that locked the tuple, as well.
|
|
|
|
|
|
|
|
- HEAP_XMAX_KEYSHR_LOCK
|
2015-05-25 20:09:05 +02:00
|
|
|
- HEAP_XMAX_SHR_LOCK
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
- HEAP_XMAX_EXCL_LOCK
|
|
|
|
These bits indicate the strength of the lock acquired; they are useful when
|
|
|
|
the XMAX is not a MultiXactId. If it's a multi, the info is to be found in
|
|
|
|
the member flags. If HEAP_XMAX_IS_MULTI is not set and HEAP_XMAX_LOCK_ONLY
|
|
|
|
is set, then one of these *must* be set as well.
|
2015-05-25 20:09:05 +02:00
|
|
|
|
|
|
|
Note that HEAP_XMAX_EXCL_LOCK does not distinguish FOR NO KEY UPDATE from
|
|
|
|
FOR UPDATE; this is implemented by the HEAP_KEYS_UPDATED bit.
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
|
|
|
|
- HEAP_KEYS_UPDATED
|
|
|
|
This bit lives in t_infomask2. If set, indicates that the XMAX updated
|
|
|
|
this tuple and changed the key values, or it deleted the tuple.
|
|
|
|
It's set regardless of whether the XMAX is a TransactionId or a MultiXactId.
|
|
|
|
|
|
|
|
We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
|
|
|
|
is set.
|