mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-09-28 09:21:50 +02:00
ec0baf949e
reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions.
498 lines
26 KiB
Plaintext
498 lines
26 KiB
Plaintext
$PostgreSQL: pgsql/src/backend/storage/lmgr/README,v 1.19 2005/12/11 21:02:18 tgl Exp $
|
|
|
|
|
|
LOCKING OVERVIEW
|
|
|
|
Postgres uses three types of interprocess locks:
|
|
|
|
* Spinlocks. These are intended for *very* short-term locks. If a lock
|
|
is to be held more than a few dozen instructions, or across any sort of
|
|
kernel call (or even a call to a nontrivial subroutine), don't use a
|
|
spinlock. Spinlocks are primarily used as infrastructure for lightweight
|
|
locks. They are implemented using a hardware atomic-test-and-set
|
|
instruction, if available. Waiting processes busy-loop until they can
|
|
get the lock. There is no provision for deadlock detection, automatic
|
|
release on error, or any other nicety. There is a timeout if the lock
|
|
cannot be gotten after a minute or so (which is approximately forever in
|
|
comparison to the intended lock hold time, so this is certainly an error
|
|
condition).
|
|
|
|
* Lightweight locks (LWLocks). These locks are typically used to
|
|
interlock access to datastructures in shared memory. LWLocks support
|
|
both exclusive and shared lock modes (for read/write and read-only
|
|
access to a shared object). There is no provision for deadlock
|
|
detection, but the LWLock manager will automatically release held
|
|
LWLocks during elog() recovery, so it is safe to raise an error while
|
|
holding LWLocks. Obtaining or releasing an LWLock is quite fast (a few
|
|
dozen instructions) when there is no contention for the lock. When a
|
|
process has to wait for an LWLock, it blocks on a SysV semaphore so as
|
|
to not consume CPU time. Waiting processes will be granted the lock in
|
|
arrival order. There is no timeout.
|
|
|
|
* Regular locks (a/k/a heavyweight locks). The regular lock manager
|
|
supports a variety of lock modes with table-driven semantics, and it has
|
|
full deadlock detection and automatic release at transaction end.
|
|
Regular locks should be used for all user-driven lock requests.
|
|
|
|
Acquisition of either a spinlock or a lightweight lock causes query
|
|
cancel and die() interrupts to be held off until all such locks are
|
|
released. No such restriction exists for regular locks, however. Also
|
|
note that we can accept query cancel and die() interrupts while waiting
|
|
for a regular lock, but we will not accept them while waiting for
|
|
spinlocks or LW locks. It is therefore not a good idea to use LW locks
|
|
when the wait time might exceed a few seconds.
|
|
|
|
The rest of this README file discusses the regular lock manager in detail.
|
|
|
|
|
|
LOCK DATA STRUCTURES
|
|
|
|
Lock methods describe the overall locking behavior. Currently there are
|
|
two lock methods: DEFAULT and USER. (USER locks are non-blocking.)
|
|
|
|
Lock modes describe the type of the lock (read/write or shared/exclusive).
|
|
In principle, each lock method can have its own set of lock modes with
|
|
different conflict rules, but currently DEFAULT and USER methods use
|
|
identical lock mode sets. See src/tools/backend/index.html and
|
|
src/include/storage/lock.h for more details. (Lock modes are also called
|
|
lock types in some places in the code and documentation.)
|
|
|
|
There are two fundamental lock structures in shared memory: the
|
|
per-lockable-object LOCK struct, and the per-lock-and-requestor PROCLOCK
|
|
struct. A LOCK object exists for each lockable object that currently has
|
|
locks held or requested on it. A PROCLOCK struct exists for each backend
|
|
that is holding or requesting lock(s) on each LOCK object.
|
|
|
|
In addition to these, each backend maintains an unshared LOCALLOCK structure
|
|
for each lockable object and lock mode that it is currently holding or
|
|
requesting. The shared lock structures only allow a single lock grant to
|
|
be made per lockable object/lock mode/backend. Internally to a backend,
|
|
however, the same lock may be requested and perhaps released multiple times
|
|
in a transaction, and it can also be held both transactionally and session-
|
|
wide. The internal request counts are held in LOCALLOCK so that the shared
|
|
data structures need not be accessed to alter them.
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
The lock manager's LOCK objects contain:
|
|
|
|
tag -
|
|
The key fields that are used for hashing locks in the shared memory
|
|
lock hash table. The contents of the tag essentially define an
|
|
individual lockable object. See include/storage/lock.h for details
|
|
about the supported types of lockable objects. This is declared as
|
|
a separate struct to ensure that we always zero out the correct number
|
|
of bytes. It is critical that any alignment-padding bytes the compiler
|
|
might insert in the struct be zeroed out, else the hash computation
|
|
will be random. (Currently, we are careful to define struct LOCKTAG
|
|
so that there are no padding bytes.)
|
|
|
|
grantMask -
|
|
This bitmask indicates what types of locks are currently held on the
|
|
given lockable object. It is used (against the lock table's conflict
|
|
table) to determine if a new lock request will conflict with existing
|
|
lock types held. Conflicts are determined by bitwise AND operations
|
|
between the grantMask and the conflict table entry for the requested
|
|
lock type. Bit i of grantMask is 1 if and only if granted[i] > 0.
|
|
|
|
waitMask -
|
|
This bitmask shows the types of locks being waited for. Bit i of waitMask
|
|
is 1 if and only if requested[i] > granted[i].
|
|
|
|
procLocks -
|
|
This is a shared memory queue of all the PROCLOCK structs associated with
|
|
the lock object. Note that both granted and waiting PROCLOCKs are in this
|
|
list (indeed, the same PROCLOCK might have some already-granted locks and
|
|
be waiting for more!).
|
|
|
|
waitProcs -
|
|
This is a shared memory queue of all PGPROC structures corresponding to
|
|
backends that are waiting (sleeping) until another backend releases this
|
|
lock. The process structure holds the information needed to determine
|
|
if it should be woken up when the lock is released.
|
|
|
|
nRequested -
|
|
Keeps a count of how many times this lock has been attempted to be
|
|
acquired. The count includes attempts by processes which were put
|
|
to sleep due to conflicts. It also counts the same backend twice
|
|
if, for example, a backend process first acquires a read and then
|
|
acquires a write. (But multiple acquisitions of the same lock/lock mode
|
|
within a backend are not multiply counted here; they are recorded
|
|
only in the backend's LOCALLOCK structure.)
|
|
|
|
requested -
|
|
Keeps a count of how many locks of each type have been attempted. Only
|
|
elements 1 through MAX_LOCKMODES-1 are used as they correspond to the lock
|
|
type defined constants. Summing the values of requested[] should come out
|
|
equal to nRequested.
|
|
|
|
nGranted -
|
|
Keeps count of how many times this lock has been successfully acquired.
|
|
This count does not include attempts that are waiting due to conflicts.
|
|
Otherwise the counting rules are the same as for nRequested.
|
|
|
|
granted -
|
|
Keeps count of how many locks of each type are currently held. Once again
|
|
only elements 1 through MAX_LOCKMODES-1 are used (0 is not). Also, like
|
|
requested[], summing the values of granted[] should total to the value
|
|
of nGranted.
|
|
|
|
We should always have 0 <= nGranted <= nRequested, and
|
|
0 <= granted[i] <= requested[i] for each i. When all the request counts
|
|
go to zero, the LOCK object is no longer needed and can be freed.
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
The lock manager's PROCLOCK objects contain:
|
|
|
|
tag -
|
|
The key fields that are used for hashing entries in the shared memory
|
|
PROCLOCK hash table. This is declared as a separate struct to ensure that
|
|
we always zero out the correct number of bytes.
|
|
|
|
tag.lock
|
|
SHMEM offset of the LOCK object this PROCLOCK is for.
|
|
|
|
tag.proc
|
|
SHMEM offset of PGPROC of backend process that owns this PROCLOCK.
|
|
|
|
holdMask -
|
|
A bitmask for the lock modes successfully acquired by this PROCLOCK.
|
|
This should be a subset of the LOCK object's grantMask, and also a
|
|
subset of the PGPROC object's heldLocks mask (if the PGPROC is
|
|
currently waiting for another lock mode on this lock).
|
|
|
|
releaseMask -
|
|
A bitmask for the lock modes due to be released during LockReleaseAll.
|
|
This must be a subset of the holdMask. Note that it is modified without
|
|
taking the partition LWLock, and therefore it is unsafe for any
|
|
backend except the one owning the PROCLOCK to examine/change it.
|
|
|
|
lockLink -
|
|
List link for shared memory queue of all the PROCLOCK objects for the
|
|
same LOCK.
|
|
|
|
procLink -
|
|
List link for shared memory queue of all the PROCLOCK objects for the
|
|
same backend.
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
|
|
LOCK MANAGER INTERNAL LOCKING
|
|
|
|
Before PostgreSQL 8.2, all of the shared-memory data structures used by
|
|
the lock manager were protected by a single LWLock, the LockMgrLock;
|
|
any operation involving these data structures had to exclusively lock
|
|
LockMgrLock. Not too surprisingly, this became a contention bottleneck.
|
|
To reduce contention, the lock manager's data structures have been split
|
|
into multiple "partitions", each protected by an independent LWLock.
|
|
Most operations only need to lock the single partition they are working in.
|
|
Here are the details:
|
|
|
|
* Each possible lock is assigned to one partition according to a hash of
|
|
its LOCKTAG value (see LockTagToPartition()). The partition's LWLock is
|
|
considered to protect all the LOCK objects of that partition as well as
|
|
their subsidiary PROCLOCKs. The shared-memory hash tables for LOCKs and
|
|
PROCLOCKs are divided into separate hash tables for each partition, and
|
|
operations on each hash table are likewise protected by the partition
|
|
lock.
|
|
|
|
* Formerly, each PGPROC had a single list of PROCLOCKs belonging to it.
|
|
This has now been split into per-partition lists, so that access to a
|
|
particular PROCLOCK list can be protected by the associated partition's
|
|
LWLock. (This is not strictly necessary at the moment, because at this
|
|
writing a PGPROC's PROCLOCK list is only accessed by the owning backend
|
|
anyway. But it seems forward-looking to maintain a convention for how
|
|
other backends could access it. In any case LockReleaseAll needs to be
|
|
able to quickly determine which partition each LOCK belongs to, and
|
|
for the currently contemplated number of partitions, this way takes less
|
|
shared memory than explicitly storing a partition number in LOCK structs
|
|
would require.)
|
|
|
|
* The other lock-related fields of a PGPROC are only interesting when
|
|
the PGPROC is waiting for a lock, so we consider that they are protected
|
|
by the partition LWLock of the awaited lock.
|
|
|
|
For normal lock acquisition and release, it is sufficient to lock the
|
|
partition containing the desired lock. Deadlock checking needs to touch
|
|
multiple partitions in general; for simplicity, we just make it lock all
|
|
the partitions in partition-number order. (To prevent LWLock deadlock,
|
|
we establish the rule that any backend needing to lock more than one
|
|
partition at once must lock them in partition-number order.) It's
|
|
possible that deadlock checking could be done without touching every
|
|
partition in typical cases, but since in a properly functioning system
|
|
deadlock checking should not occur often enough to be performance-critical,
|
|
trying to make this work does not seem a productive use of effort.
|
|
|
|
A backend's internal LOCALLOCK hash table is not partitioned. We do store
|
|
the partition number in LOCALLOCK table entries, but this is a straight
|
|
speed-for-space tradeoff: we could instead recalculate the partition
|
|
number from the LOCKTAG when needed.
|
|
|
|
|
|
THE DEADLOCK DETECTION ALGORITHM
|
|
|
|
Since we allow user transactions to request locks in any order, deadlock
|
|
is possible. We use a deadlock detection/breaking algorithm that is
|
|
fairly standard in essence, but there are many special considerations
|
|
needed to deal with Postgres' generalized locking model.
|
|
|
|
A key design consideration is that we want to make routine operations
|
|
(lock grant and release) run quickly when there is no deadlock, and
|
|
avoid the overhead of deadlock handling as much as possible. We do this
|
|
using an "optimistic waiting" approach: if a process cannot acquire the
|
|
lock it wants immediately, it goes to sleep without any deadlock check.
|
|
But it also sets a delay timer, with a delay of DeadlockTimeout
|
|
milliseconds (typically set to one second). If the delay expires before
|
|
the process is granted the lock it wants, it runs the deadlock
|
|
detection/breaking code. Normally this code will determine that there is
|
|
no deadlock condition, and then the process will go back to sleep and
|
|
wait quietly until it is granted the lock. But if a deadlock condition
|
|
does exist, it will be resolved, usually by aborting the detecting
|
|
process' transaction. In this way, we avoid deadlock handling overhead
|
|
whenever the wait time for a lock is less than DeadlockTimeout, while
|
|
not imposing an unreasonable delay of detection when there is an error.
|
|
|
|
Lock acquisition (routines LockAcquire and ProcSleep) follows these rules:
|
|
|
|
1. A lock request is granted immediately if it does not conflict with
|
|
any existing or waiting lock request, or if the process already holds an
|
|
instance of the same lock type (eg, there's no penalty to acquire a read
|
|
lock twice). Note that a process never conflicts with itself, eg one
|
|
can obtain read lock when one already holds exclusive lock.
|
|
|
|
2. Otherwise the process joins the lock's wait queue. Normally it will
|
|
be added to the end of the queue, but there is an exception: if the
|
|
process already holds locks on this same lockable object that conflict
|
|
with the request of any pending waiter, then the process will be
|
|
inserted in the wait queue just ahead of the first such waiter. (If we
|
|
did not make this check, the deadlock detection code would adjust the
|
|
queue order to resolve the conflict, but it's relatively cheap to make
|
|
the check in ProcSleep and avoid a deadlock timeout delay in this case.)
|
|
Note special case when inserting before the end of the queue: if the
|
|
process's request does not conflict with any existing lock nor any
|
|
waiting request before its insertion point, then go ahead and grant the
|
|
lock without waiting.
|
|
|
|
When a lock is released, the lock release routine (ProcLockWakeup) scans
|
|
the lock object's wait queue. Each waiter is awoken if (a) its request
|
|
does not conflict with already-granted locks, and (b) its request does
|
|
not conflict with the requests of prior un-wakable waiters. Rule (b)
|
|
ensures that conflicting requests are granted in order of arrival. There
|
|
are cases where a later waiter must be allowed to go in front of
|
|
conflicting earlier waiters to avoid deadlock, but it is not
|
|
ProcLockWakeup's responsibility to recognize these cases; instead, the
|
|
deadlock detection code will re-order the wait queue when necessary.
|
|
|
|
To perform deadlock checking, we use the standard method of viewing the
|
|
various processes as nodes in a directed graph (the waits-for graph or
|
|
WFG). There is a graph edge leading from process A to process B if A
|
|
waits for B, ie, A is waiting for some lock and B holds a conflicting
|
|
lock. There is a deadlock condition if and only if the WFG contains a
|
|
cycle. We detect cycles by searching outward along waits-for edges to
|
|
see if we return to our starting point. There are three possible
|
|
outcomes:
|
|
|
|
1. All outgoing paths terminate at a running process (which has no
|
|
outgoing edge).
|
|
|
|
2. A deadlock is detected by looping back to the start point. We
|
|
resolve such a deadlock by canceling the start point's lock request and
|
|
reporting an error in that transaction, which normally leads to
|
|
transaction abort and release of that transaction's held locks. Note
|
|
that it's sufficient to cancel one request to remove the cycle; we don't
|
|
need to kill all the transactions involved.
|
|
|
|
3. Some path(s) loop back to a node other than the start point. This
|
|
indicates a deadlock, but one that does not involve our starting
|
|
process. We ignore this condition on the grounds that resolving such a
|
|
deadlock is the responsibility of the processes involved --- killing our
|
|
start- point process would not resolve the deadlock. So, cases 1 and 3
|
|
both report "no deadlock".
|
|
|
|
Postgres' situation is a little more complex than the standard discussion
|
|
of deadlock detection, for two reasons:
|
|
|
|
1. A process can be waiting for more than one other process, since there
|
|
might be multiple PROCLOCKs of (non-conflicting) lock types that all
|
|
conflict with the waiter's request. This creates no real difficulty
|
|
however; we simply need to be prepared to trace more than one outgoing
|
|
edge.
|
|
|
|
2. If a process A is behind a process B in some lock's wait queue, and
|
|
their requested locks conflict, then we must say that A waits for B, since
|
|
ProcLockWakeup will never awaken A before B. This creates additional
|
|
edges in the WFG. We call these "soft" edges, as opposed to the "hard"
|
|
edges induced by locks already held. Note that if B already holds any
|
|
locks conflicting with A's request, then their relationship is a hard edge
|
|
not a soft edge.
|
|
|
|
A "soft" block, or wait-priority block, has the same potential for
|
|
inducing deadlock as a hard block. However, we may be able to resolve
|
|
a soft block without aborting the transactions involved: we can instead
|
|
rearrange the order of the wait queue. This rearrangement reverses the
|
|
direction of the soft edge between two processes with conflicting requests
|
|
whose queue order is reversed. If we can find a rearrangement that
|
|
eliminates a cycle without creating new ones, then we can avoid an abort.
|
|
Checking for such possible rearrangements is the trickiest part of the
|
|
algorithm.
|
|
|
|
The workhorse of the deadlock detector is a routine FindLockCycle() which
|
|
is given a starting point process (which must be a waiting process).
|
|
It recursively scans outward across waits-for edges as discussed above.
|
|
If it finds no cycle involving the start point, it returns "false".
|
|
(As discussed above, we can ignore cycles not involving the start point.)
|
|
When such a cycle is found, FindLockCycle() returns "true", and as it
|
|
unwinds it also builds a list of any "soft" edges involved in the cycle.
|
|
If the resulting list is empty then there is a hard deadlock and the
|
|
configuration cannot succeed. However, if the list is not empty, then
|
|
reversing any one of the listed edges through wait-queue rearrangement
|
|
will eliminate that cycle. Since such a reversal might create cycles
|
|
elsewhere, we may need to try every possibility. Therefore, we need to
|
|
be able to invoke FindLockCycle() on hypothetical configurations (wait
|
|
orders) as well as the current real order.
|
|
|
|
The easiest way to handle this seems to be to have a lookaside table that
|
|
shows the proposed new queue order for each wait queue that we are
|
|
considering rearranging. This table is checked by FindLockCycle, and it
|
|
believes the proposed queue order rather than the real order for each lock
|
|
that has an entry in the lookaside table.
|
|
|
|
We build a proposed new queue order by doing a "topological sort" of the
|
|
existing entries. Each soft edge that we are currently considering
|
|
reversing creates a property of the partial order that the topological sort
|
|
has to enforce. We must use a sort method that preserves the input
|
|
ordering as much as possible, so as not to gratuitously break arrival
|
|
order for processes not involved in a deadlock. (This is not true of the
|
|
tsort method shown in Knuth, for example, but it's easily done by a simple
|
|
doubly-nested-loop method that emits the first legal candidate at each
|
|
step. Fortunately, we don't need a highly efficient sort algorithm, since
|
|
the number of partial order constraints is not likely to be large.) Note
|
|
that failure of the topological sort tells us we have conflicting ordering
|
|
constraints, and therefore that the last-added soft edge reversal
|
|
conflicts with a prior edge reversal. We need to detect this case to
|
|
avoid an infinite loop in the case where no possible rearrangement will
|
|
work: otherwise, we might try a reversal, find that it still leads to
|
|
a cycle, then try to un-reverse the reversal while trying to get rid of
|
|
that cycle, etc etc. Topological sort failure tells us the un-reversal
|
|
is not a legitimate move in this context.
|
|
|
|
So, the basic step in our rearrangement method is to take a list of
|
|
soft edges in a cycle (as returned by FindLockCycle()) and successively
|
|
try the reversal of each one as a topological-sort constraint added to
|
|
whatever constraints we are already considering. We recursively search
|
|
through all such sets of constraints to see if any one eliminates all
|
|
the deadlock cycles at once. Although this might seem impossibly
|
|
inefficient, it shouldn't be a big problem in practice, because there
|
|
will normally be very few, and not very large, deadlock cycles --- if
|
|
any at all. So the combinatorial inefficiency isn't going to hurt us.
|
|
Besides, it's better to spend some time to guarantee that we've checked
|
|
all possible escape routes than to abort a transaction when we didn't
|
|
really have to.
|
|
|
|
Each edge reversal constraint can be viewed as requesting that the waiting
|
|
process A be moved to before the blocking process B in the wait queue they
|
|
are both in. This action will reverse the desired soft edge, as well as
|
|
any other soft edges between A and other processes it is advanced over.
|
|
No other edges will be affected (note this is actually a constraint on our
|
|
topological sort method to not re-order the queue more than necessary.)
|
|
Therefore, we can be sure we have not created any new deadlock cycles if
|
|
neither FindLockCycle(A) nor FindLockCycle(B) discovers any cycle. Given
|
|
the above-defined behavior of FindLockCycle, each of these searches is
|
|
necessary as well as sufficient, since FindLockCycle starting at the
|
|
original start point will not complain about cycles that include A or B
|
|
but not the original start point.
|
|
|
|
In short then, a proposed rearrangement of the wait queue(s) is determined
|
|
by one or more broken soft edges A->B, fully specified by the output of
|
|
topological sorts of each wait queue involved, and then tested by invoking
|
|
FindLockCycle() starting at the original start point as well as each of
|
|
the mentioned processes (A's and B's). If none of the tests detect a
|
|
cycle, then we have a valid configuration and can implement it by
|
|
reordering the wait queues per the sort outputs (and then applying
|
|
ProcLockWakeup on each reordered queue, in case a waiter has become wakable).
|
|
If any test detects a soft cycle, we can try to resolve it by adding each
|
|
soft link in that cycle, in turn, to the proposed rearrangement list.
|
|
This is repeated recursively until we either find a workable rearrangement
|
|
or determine that none exists. In the latter case, the outer level
|
|
resolves the deadlock by aborting the original start-point transaction.
|
|
|
|
The particular order in which rearrangements are tried depends on the
|
|
order FindLockCycle() happens to scan in, so if there are multiple
|
|
workable rearrangements of the wait queues, then it is unspecified which
|
|
one will be chosen. What's more important is that we guarantee to try
|
|
every queue rearrangement that could lead to success. (For example,
|
|
if we have A before B before C and the needed order constraints are
|
|
C before A and B before C, we would first discover that A before C
|
|
doesn't work and try the rearrangement C before A before B. This would
|
|
eventually lead to the discovery of the additional constraint B before C.)
|
|
|
|
Got that?
|
|
|
|
Miscellaneous notes:
|
|
|
|
1. It is easily proven that no deadlock will be missed due to our
|
|
asynchronous invocation of deadlock checking. A deadlock cycle in the WFG
|
|
is formed when the last edge in the cycle is added; therefore the last
|
|
process in the cycle to wait (the one from which that edge is outgoing) is
|
|
certain to detect and resolve the cycle when it later runs CheckDeadLock.
|
|
This holds even if that edge addition created multiple cycles; the process
|
|
may indeed abort without ever noticing those additional cycles, but we
|
|
don't particularly care. The only other possible creation of deadlocks is
|
|
during deadlock resolution's rearrangement of wait queues, and we already
|
|
saw that that algorithm will prove that it creates no new deadlocks before
|
|
it attempts to actually execute any rearrangement.
|
|
|
|
2. It is not certain that a deadlock will be resolved by aborting the
|
|
last-to-wait process. If earlier waiters in the cycle have not yet run
|
|
CheckDeadLock, then the first one to do so will be the victim.
|
|
|
|
3. No live (wakable) process can be missed by ProcLockWakeup, since it
|
|
examines every member of the wait queue (this was not true in the 7.0
|
|
implementation, BTW). Therefore, if ProcLockWakeup is always invoked
|
|
after a lock is released or a wait queue is rearranged, there can be no
|
|
failure to wake a wakable process. One should also note that
|
|
LockWaitCancel (abort a waiter due to outside factors) must run
|
|
ProcLockWakeup, in case the canceled waiter was soft-blocking other
|
|
waiters.
|
|
|
|
4. We can minimize excess rearrangement-trial work by being careful to
|
|
scan the wait queue from the front when looking for soft edges. For
|
|
example, if we have queue order A,B,C and C has deadlock conflicts with
|
|
both A and B, we want to generate the "C before A" constraint first,
|
|
rather than wasting time with "C before B", which won't move C far
|
|
enough up. So we look for soft edges outgoing from C starting at the
|
|
front of the wait queue.
|
|
|
|
5. The working data structures needed by the deadlock detection code can
|
|
be limited to numbers of entries computed from MaxBackends. Therefore,
|
|
we can allocate the worst-case space needed during backend startup. This
|
|
seems a safer approach than trying to allocate workspace on the fly; we
|
|
don't want to risk having the deadlock detector run out of memory, else
|
|
we really have no guarantees at all that deadlock will be detected.
|
|
|
|
|
|
USER LOCKS
|
|
|
|
User locks are handled totally on the application side as long term
|
|
cooperative locks which extend beyond the normal transaction boundaries.
|
|
Their purpose is to indicate to an application that someone is `working'
|
|
on an item. So it is possible to put an user lock on a tuple's oid,
|
|
retrieve the tuple, work on it for an hour and then update it and remove
|
|
the lock. While the lock is active other clients can still read and write
|
|
the tuple but they can be aware that it has been locked at the application
|
|
level by someone.
|
|
|
|
User locks and normal locks are completely orthogonal and they don't
|
|
interfere with each other.
|
|
|
|
User locks are always non blocking, therefore they are never acquired if
|
|
already held by another process. They must be released explicitly by the
|
|
application but they are released automatically when a backend terminates.
|
|
|
|
The lockmode parameter can have the same values as for normal locks although
|
|
probably only ExclusiveLock can have some practical use.
|
|
|
|
DZ - 22 Nov 1997
|