postgresql/src/backend/utils/time/tqual.c

1345 lines
40 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* tqual.c
* POSTGRES "time qualification" code, ie, tuple visibility rules.
*
* NOTE: all the HeapTupleSatisfies routines will update the tuple's
* "hint" status bits if we see that the inserting or deleting transaction
* has now committed or aborted (and it is safe to set the hint bits).
* If the hint bits are changed, SetBufferCommitInfoNeedsSave is called on
* the passed-in buffer. The caller must hold not only a pin, but at least
* shared buffer content lock on the buffer containing the tuple.
*
* NOTE: must check TransactionIdIsInProgress (which looks in PGPROC array)
* before TransactionIdDidCommit/TransactionIdDidAbort (which look in
* pg_clog). Otherwise we have a race condition: we might decide that a
* just-committed transaction crashed, because none of the tests succeed.
* xact.c is careful to record commit/abort in pg_clog before it unsets
* MyProc->xid in PGPROC array. That fixes that problem, but it also
* means there is a window where TransactionIdIsInProgress and
* TransactionIdDidCommit will both return true. If we check only
* TransactionIdDidCommit, we could consider a tuple committed when a
* later GetSnapshotData call will still think the originating transaction
2005-10-15 04:49:52 +02:00
* is in progress, which leads to application-level inconsistency. The
* upshot is that we gotta check TransactionIdIsInProgress first in all
* code paths, except for a few cases where we are looking at
* subtransactions of our own main transaction and so there can't be any
* race condition.
*
* Summary of visibility functions:
*
* HeapTupleSatisfiesMVCC()
* visible to supplied snapshot, excludes current command
* HeapTupleSatisfiesNow()
* visible to instant snapshot, excludes current command
* HeapTupleSatisfiesUpdate()
* like HeapTupleSatisfiesNow(), but with user-supplied command
* counter and more complex result
* HeapTupleSatisfiesSelf()
* visible to instant snapshot and current command
* HeapTupleSatisfiesDirty()
* like HeapTupleSatisfiesSelf(), but includes open transactions
* HeapTupleSatisfiesVacuum()
* visible to any running transaction, used by VACUUM
* HeapTupleSatisfiesToast()
* visible unless part of interrupted vacuum, used for TOAST
* HeapTupleSatisfiesAny()
* all tuples are visible
*
2010-01-02 17:58:17 +01:00
* Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/utils/time/tqual.c,v 1.116 2010/02/08 04:33:54 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/transam.h"
#include "access/xact.h"
2006-07-13 19:47:02 +02:00
#include "storage/bufmgr.h"
#include "storage/procarray.h"
#include "utils/tqual.h"
/* Static variables representing various special snapshot semantics */
SnapshotData SnapshotNowData = {HeapTupleSatisfiesNow};
SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
/*
* SetHintBits()
*
* Set commit/abort hint bits on a tuple, if appropriate at this time.
*
* It is only safe to set a transaction-committed hint bit if we know the
* transaction's commit record has been flushed to disk. We cannot change
* the LSN of the page here because we may hold only a share lock on the
* buffer, so we can't use the LSN to interlock this; we have to just refrain
* from setting the hint bit until some future re-examination of the tuple.
*
2007-11-15 22:14:46 +01:00
* We can always set hint bits when marking a transaction aborted. (Some
* code in heapam.c relies on that!)
*
* Also, if we are cleaning up HEAP_MOVED_IN or HEAP_MOVED_OFF entries, then
* we can always set the hint bits, since old-style VACUUM FULL always used
* synchronous commits and didn't move tuples that weren't previously
* hinted. (This is not known by this subroutine, but is applied by its
* callers.) Note: old-style VACUUM FULL is gone, but we have to keep this
* module's support for MOVED_OFF/MOVED_IN flag bits for as long as we
* support in-place update from pre-9.0 databases.
*
* Normal commits may be asynchronous, so for those we need to get the LSN
* of the transaction and then check whether this is flushed.
*
* The caller should pass xid as the XID of the transaction to check, or
* InvalidTransactionId if no check is needed.
*/
static inline void
SetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid)
{
if (TransactionIdIsValid(xid))
{
/* NB: xid must be known committed here! */
2007-11-15 22:14:46 +01:00
XLogRecPtr commitLSN = TransactionIdGetCommitLSN(xid);
if (XLogNeedsFlush(commitLSN))
return; /* not flushed yet, so don't set hint */
}
tuple->t_infomask |= infomask;
SetBufferCommitInfoNeedsSave(buffer);
}
/*
* HeapTupleSetHintBits --- exported version of SetHintBits()
*
* This must be separate because of C99's brain-dead notions about how to
* implement inline functions.
*/
void
HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid)
{
SetHintBits(tuple, buffer, infomask, xid);
}
/*
* HeapTupleSatisfiesSelf
2002-01-17 00:51:56 +01:00
* True iff heap tuple is valid "for itself".
*
2002-01-17 00:51:56 +01:00
* Here, we consider the effects of:
* all committed transactions (as of the current instant)
* previous commands of this transaction
* changes made by the current command
*
* Note:
* Assumes heap tuple is valid.
*
* The satisfaction of "itself" requires the following:
*
* ((Xmin == my-transaction && the row was updated by the current transaction, and
* (Xmax is null it was not deleted
* [|| Xmax != my-transaction)]) [or it was deleted by another transaction]
* ||
*
* (Xmin is committed && the row was modified by a committed transaction, and
* (Xmax is null || the row has not been deleted, or
* (Xmax != my-transaction && the row was deleted by another transaction
* Xmax is not committed))) that has not been committed
*/
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
1997-11-02 16:27:14 +01:00
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
1998-09-01 05:29:17 +02:00
return false;
if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return false;
if (!TransactionIdIsInProgress(xvac))
{
if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (!TransactionIdIsCurrentTransactionId(xvac))
{
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
}
else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
{
1997-11-02 16:27:14 +01:00
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
1998-09-01 05:29:17 +02:00
return true;
if (tuple->t_infomask & HEAP_IS_LOCKED) /* not deleter */
return true;
Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
/* deleting subtransaction must have aborted */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
return false;
}
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
1998-09-01 05:29:17 +02:00
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetXmin(tuple));
else
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
/* by here, the inserting transaction has committed */
From: Dan McGuirk <mcguirk@indirect.com> Reply-To: hackers@hub.org, Dan McGuirk <mcguirk@indirect.com> To: hackers@hub.org Subject: [HACKERS] tmin writeback optimization I was doing some profiling of the backend, and noticed that during a certain benchmark I was running somewhere between 30% and 75% of the backend's CPU time was being spent in calls to TransactionIdDidCommit() from HeapTupleSatisfiesNow() or HeapTupleSatisfiesItself() to determine that changed rows' transactions had in fact been committed even though the rows' tmin values had not yet been set. When a query looks at a given row, it needs to figure out whether the transaction that changed the row has been committed and hence it should pay attention to the row, or whether on the other hand the transaction is still in progress or has been aborted and hence the row should be ignored. If a tmin value is set, it is known definitively that the row's transaction has been committed. However, if tmin is not set, the transaction referred to in xmin must be looked up in pg_log, and this is what the backend was spending a lot of time doing during my benchmark. So, implementing a method suggested by Vadim, I created the following patch that, the first time a query finds a committed row whose tmin value is not set, sets it, and marks the buffer where the row is stored as dirty. (It works for tmax, too.) This doesn't result in the boost in real time performance I was hoping for, however it does decrease backend CPU usage by up to two-thirds in certain situations, so it could be rather beneficial in high-concurrency settings.
1997-03-28 08:06:53 +01:00
1997-11-02 16:27:14 +01:00
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
1998-09-01 05:29:17 +02:00
return true;
1997-11-02 16:27:14 +01:00
if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
1999-05-25 18:15:34 +02:00
return false; /* updated by other */
}
1997-11-02 16:27:14 +01:00
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);
return true;
}
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
1998-09-01 05:29:17 +02:00
return false;
}
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return true;
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
1998-09-01 05:29:17 +02:00
return true;
}
/* xmax transaction committed */
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
HeapTupleHeaderGetXmax(tuple));
1998-09-01 05:29:17 +02:00
return false;
}
/*
1999-05-25 18:15:34 +02:00
* HeapTupleSatisfiesNow
2002-01-17 00:51:56 +01:00
* True iff heap tuple is valid "now".
*
2002-01-17 00:51:56 +01:00
* Here, we consider the effects of:
* all committed transactions (as of the current instant)
* previous commands of this transaction
*
2002-01-17 00:51:56 +01:00
* Note we do _not_ include changes made by the current command. This
* solves the "Halloween problem" wherein an UPDATE might try to re-update
* its own output tuples, http://en.wikipedia.org/wiki/Halloween_Problem.
*
* Note:
* Assumes heap tuple is valid.
*
* The satisfaction of "now" requires the following:
*
* ((Xmin == my-transaction && inserted by the current transaction
* Cmin < my-command && before this command, and
* (Xmax is null || the row has not been deleted, or
* (Xmax == my-transaction && it was deleted by the current transaction
* Cmax >= my-command))) but not before this command,
* || or
* (Xmin is committed && the row was inserted by a committed transaction, and
* (Xmax is null || the row has not been deleted, or
* (Xmax == my-transaction && the row is being deleted by this transaction
* Cmax >= my-command) || but it's not deleted "yet", or
* (Xmax != my-transaction && the row was deleted by another transaction
* Xmax is not committed)))) that has not been committed
*
* mao says 17 march 1993: the tests in this routine are correct;
* if you think they're not, you're wrong, and you should think
* about it again. i know, it happened to me. we don't need to
* check commit time against the start time of this transaction
* because 2ph locking protects us from doing the wrong thing.
* if you mess around here, you'll break serializability. the only
* problem with this code is that it does the wrong thing for system
* catalog updates, because the catalogs aren't subject to 2ph, so
* the serializability guarantees we provide don't extend to xacts
* that do catalog accesses. this is unfortunate, but not critical.
*/
bool
HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
1997-11-02 16:27:14 +01:00
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
1998-09-01 05:29:17 +02:00
return false;
if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return false;
if (!TransactionIdIsInProgress(xvac))
{
if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (!TransactionIdIsCurrentTransactionId(xvac))
{
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
}
else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
{
if (HeapTupleHeaderGetCmin(tuple) >= GetCurrentCommandId(false))
return false; /* inserted after scan started */
1997-11-02 16:27:14 +01:00
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
1998-09-01 05:29:17 +02:00
return true;
if (tuple->t_infomask & HEAP_IS_LOCKED) /* not deleter */
return true;
Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
/* deleting subtransaction must have aborted */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
if (HeapTupleHeaderGetCmax(tuple) >= GetCurrentCommandId(false))
1998-09-01 05:29:17 +02:00
return true; /* deleted after scan started */
1997-11-02 16:27:14 +01:00
else
return false; /* deleted before scan started */
}
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
1998-09-01 05:29:17 +02:00
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetXmin(tuple));
else
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
From: Dan McGuirk <mcguirk@indirect.com> Reply-To: hackers@hub.org, Dan McGuirk <mcguirk@indirect.com> To: hackers@hub.org Subject: [HACKERS] tmin writeback optimization I was doing some profiling of the backend, and noticed that during a certain benchmark I was running somewhere between 30% and 75% of the backend's CPU time was being spent in calls to TransactionIdDidCommit() from HeapTupleSatisfiesNow() or HeapTupleSatisfiesItself() to determine that changed rows' transactions had in fact been committed even though the rows' tmin values had not yet been set. When a query looks at a given row, it needs to figure out whether the transaction that changed the row has been committed and hence it should pay attention to the row, or whether on the other hand the transaction is still in progress or has been aborted and hence the row should be ignored. If a tmin value is set, it is known definitively that the row's transaction has been committed. However, if tmin is not set, the transaction referred to in xmin must be looked up in pg_log, and this is what the backend was spending a lot of time doing during my benchmark. So, implementing a method suggested by Vadim, I created the following patch that, the first time a query finds a committed row whose tmin value is not set, sets it, and marks the buffer where the row is stored as dirty. (It works for tmax, too.) This doesn't result in the boost in real time performance I was hoping for, however it does decrease backend CPU usage by up to two-thirds in certain situations, so it could be rather beneficial in high-concurrency settings.
1997-03-28 08:06:53 +01:00
/* by here, the inserting transaction has committed */
From: Dan McGuirk <mcguirk@indirect.com> Reply-To: hackers@hub.org, Dan McGuirk <mcguirk@indirect.com> To: hackers@hub.org Subject: [HACKERS] tmin writeback optimization I was doing some profiling of the backend, and noticed that during a certain benchmark I was running somewhere between 30% and 75% of the backend's CPU time was being spent in calls to TransactionIdDidCommit() from HeapTupleSatisfiesNow() or HeapTupleSatisfiesItself() to determine that changed rows' transactions had in fact been committed even though the rows' tmin values had not yet been set. When a query looks at a given row, it needs to figure out whether the transaction that changed the row has been committed and hence it should pay attention to the row, or whether on the other hand the transaction is still in progress or has been aborted and hence the row should be ignored. If a tmin value is set, it is known definitively that the row's transaction has been committed. However, if tmin is not set, the transaction referred to in xmin must be looked up in pg_log, and this is what the backend was spending a lot of time doing during my benchmark. So, implementing a method suggested by Vadim, I created the following patch that, the first time a query finds a committed row whose tmin value is not set, sets it, and marks the buffer where the row is stored as dirty. (It works for tmax, too.) This doesn't result in the boost in real time performance I was hoping for, however it does decrease backend CPU usage by up to two-thirds in certain situations, so it could be rather beneficial in high-concurrency settings.
1997-03-28 08:06:53 +01:00
1997-11-02 16:27:14 +01:00
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
1998-09-01 05:29:17 +02:00
return true;
1997-11-02 16:27:14 +01:00
if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
1998-09-01 05:29:17 +02:00
return false;
}
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);
return true;
}
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
if (HeapTupleHeaderGetCmax(tuple) >= GetCurrentCommandId(false))
1998-09-01 05:29:17 +02:00
return true; /* deleted after scan started */
1997-11-02 16:27:14 +01:00
else
1998-09-01 05:29:17 +02:00
return false; /* deleted before scan started */
}
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return true;
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
1998-09-01 05:29:17 +02:00
return true;
}
1997-11-02 16:27:14 +01:00
/* xmax transaction committed */
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
HeapTupleHeaderGetXmax(tuple));
1998-09-01 05:29:17 +02:00
return false;
}
/*
* HeapTupleSatisfiesAny
* Dummy "satisfies" routine: any tuple satisfies SnapshotAny.
*/
bool
HeapTupleSatisfiesAny(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer)
{
return true;
}
/*
* HeapTupleSatisfiesToast
2002-01-17 00:51:56 +01:00
* True iff heap tuple is valid as a TOAST row.
*
* This is a simplified version that only checks for VACUUM moving conditions.
* It's appropriate for TOAST usage because TOAST really doesn't want to do
* its own time qual checks; if you can see the main table row that contains
2002-09-04 22:31:48 +02:00
* a TOAST reference, you should be able to see the TOASTed value. However,
* vacuuming a TOAST table is independent of the main table, and in case such
* a vacuum fails partway through, we'd better do this much checking.
*
* Among other things, this means you can't do UPDATEs of rows in a TOAST
* table.
*/
bool
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return false;
if (!TransactionIdIsInProgress(xvac))
{
if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (!TransactionIdIsCurrentTransactionId(xvac))
{
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
}
}
/* otherwise assume the tuple is valid for TOAST. */
return true;
}
/*
* HeapTupleSatisfiesUpdate
*
2002-01-17 00:51:56 +01:00
* Same logic as HeapTupleSatisfiesNow, but returns a more detailed result
* code, since UPDATE needs to know more than "is it visible?". Also,
* tuples of my own xact are tested against the passed CommandId not
* CurrentCommandId.
*
* The possible return codes are:
*
* HeapTupleInvisible: the tuple didn't exist at all when the scan started,
* e.g. it was created by a later CommandId.
*
* HeapTupleMayBeUpdated: The tuple is valid and visible, so it may be
* updated.
*
* HeapTupleSelfUpdated: The tuple was updated by the current transaction,
* after the current scan started.
*
* HeapTupleUpdated: The tuple was updated by a committed transaction.
*
* HeapTupleBeingUpdated: The tuple is being updated by an in-progress
* transaction other than the current transaction. (Note: this includes
* the case where the tuple is share-locked by a MultiXact, even if the
* MultiXact includes the current transaction. Callers that want to
* distinguish that case must test for it themselves.)
*/
HTSU_Result
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
Buffer buffer)
{
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return HeapTupleInvisible;
if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return HeapTupleInvisible;
if (!TransactionIdIsInProgress(xvac))
{
if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return HeapTupleInvisible;
}
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (!TransactionIdIsCurrentTransactionId(xvac))
{
if (TransactionIdIsInProgress(xvac))
return HeapTupleInvisible;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return HeapTupleInvisible;
}
}
}
else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
{
if (HeapTupleHeaderGetCmin(tuple) >= curcid)
2005-10-15 04:49:52 +02:00
return HeapTupleInvisible; /* inserted after scan started */
2002-09-04 22:31:48 +02:00
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
return HeapTupleMayBeUpdated;
if (tuple->t_infomask & HEAP_IS_LOCKED) /* not deleter */
return HeapTupleMayBeUpdated;
Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
/* deleting subtransaction must have aborted */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return HeapTupleMayBeUpdated;
}
if (HeapTupleHeaderGetCmax(tuple) >= curcid)
2005-10-15 04:49:52 +02:00
return HeapTupleSelfUpdated; /* updated after scan started */
else
2005-10-15 04:49:52 +02:00
return HeapTupleInvisible; /* updated before scan started */
}
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
return HeapTupleInvisible;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetXmin(tuple));
else
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return HeapTupleInvisible;
}
}
/* by here, the inserting transaction has committed */
2002-09-04 22:31:48 +02:00
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return HeapTupleMayBeUpdated;
if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HeapTupleMayBeUpdated;
return HeapTupleUpdated; /* updated by other */
}
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);
if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
return HeapTupleBeingUpdated;
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return HeapTupleMayBeUpdated;
}
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HeapTupleMayBeUpdated;
if (HeapTupleHeaderGetCmax(tuple) >= curcid)
2005-10-15 04:49:52 +02:00
return HeapTupleSelfUpdated; /* updated after scan started */
else
return HeapTupleInvisible; /* updated before scan started */
}
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return HeapTupleBeingUpdated;
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return HeapTupleMayBeUpdated;
}
/* xmax transaction committed */
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return HeapTupleMayBeUpdated;
}
SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
HeapTupleHeaderGetXmax(tuple));
1999-05-25 18:15:34 +02:00
return HeapTupleUpdated; /* updated by other */
}
2002-01-17 00:51:56 +01:00
/*
* HeapTupleSatisfiesDirty
* True iff heap tuple is valid including effects of open transactions.
*
2002-01-17 00:51:56 +01:00
* Here, we consider the effects of:
* all committed and in-progress transactions (as of the current instant)
* previous commands of this transaction
2002-01-17 00:51:56 +01:00
* changes made by the current command
*
* This is essentially like HeapTupleSatisfiesSelf as far as effects of
* the current transaction and committed/aborted xacts are concerned.
* However, we also include the effects of other xacts still in progress.
*
* A special hack is that the passed-in snapshot struct is used as an
* output argument to return the xids of concurrent xacts that affected the
* tuple. snapshot->xmin is set to the tuple's xmin if that is another
* transaction that's still in progress; or to InvalidTransactionId if the
* tuple's xmin is committed good, committed dead, or my own xact. Similarly
* for snapshot->xmax and the tuple's xmax.
*/
bool
HeapTupleSatisfiesDirty(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
snapshot->xmin = snapshot->xmax = InvalidTransactionId;
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return false;
if (!TransactionIdIsInProgress(xvac))
{
if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (!TransactionIdIsCurrentTransactionId(xvac))
{
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
}
else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
{
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
return true;
if (tuple->t_infomask & HEAP_IS_LOCKED) /* not deleter */
return true;
Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
/* deleting subtransaction must have aborted */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
return false;
}
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
{
snapshot->xmin = HeapTupleHeaderGetXmin(tuple);
/* XXX shouldn't we fall through to look at xmax? */
1999-05-25 18:15:34 +02:00
return true; /* in insertion by other */
}
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetXmin(tuple));
else
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
/* by here, the inserting transaction has committed */
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
1999-05-25 18:15:34 +02:00
return false; /* updated by other */
}
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);
return true;
}
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
return false;
}
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
{
snapshot->xmax = HeapTupleHeaderGetXmax(tuple);
return true;
}
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
/* xmax transaction committed */
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
HeapTupleHeaderGetXmax(tuple));
1999-05-25 18:15:34 +02:00
return false; /* updated by other */
}
1998-12-16 12:53:55 +01:00
/*
* HeapTupleSatisfiesMVCC
* True iff heap tuple is valid for the given MVCC snapshot.
*
2002-01-17 00:51:56 +01:00
* Here, we consider the effects of:
* all transactions committed as of the time of the given snapshot
* previous commands of this transaction
*
* Does _not_ include:
2002-01-17 00:51:56 +01:00
* transactions shown as in-progress by the snapshot
* transactions started after the snapshot was taken
* changes made by the current command
*
* This is the same as HeapTupleSatisfiesNow, except that transactions that
* were in progress or as yet unstarted when the snapshot was taken will
* be treated as uncommitted, even if they have committed by now.
*
* (Notice, however, that the tuple status hint bits will be updated on the
* basis of the true state of the transaction, even if we then pretend we
* can't see it.)
*/
1998-12-16 12:53:55 +01:00
bool
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
1998-12-16 12:53:55 +01:00
{
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
1998-12-16 12:53:55 +01:00
return false;
if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return false;
if (!TransactionIdIsInProgress(xvac))
{
if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (!TransactionIdIsCurrentTransactionId(xvac))
{
if (TransactionIdIsInProgress(xvac))
return false;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
}
else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))
1998-12-16 12:53:55 +01:00
{
if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
1998-12-16 12:53:55 +01:00
return false; /* inserted after scan started */
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
return true;
if (tuple->t_infomask & HEAP_IS_LOCKED) /* not deleter */
return true;
Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
/* deleting subtransaction must have aborted */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
}
if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
1998-12-16 12:53:55 +01:00
return true; /* deleted after scan started */
else
return false; /* deleted before scan started */
}
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
1998-12-16 12:53:55 +01:00
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetXmin(tuple));
else
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
1998-12-16 12:53:55 +01:00
}
1999-05-25 18:15:34 +02:00
/*
* By here, the inserting transaction has committed - have to check
* when...
1998-12-16 12:53:55 +01:00
*/
if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot))
return false; /* treat as still in progress */
1998-12-16 12:53:55 +01:00
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or aborted */
return true;
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);
1998-12-16 12:53:55 +01:00
return true;
}
1998-12-16 12:53:55 +01:00
if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
1998-12-16 12:53:55 +01:00
{
if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
1999-05-25 18:15:34 +02:00
return true; /* deleted after scan started */
1998-12-16 12:53:55 +01:00
else
1999-05-25 18:15:34 +02:00
return false; /* deleted before scan started */
1998-12-16 12:53:55 +01:00
}
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return true;
if (!TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
1998-12-16 12:53:55 +01:00
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
1998-12-16 12:53:55 +01:00
return true;
}
/* xmax transaction committed */
SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
HeapTupleHeaderGetXmax(tuple));
1998-12-16 12:53:55 +01:00
}
/*
* OK, the deleting transaction committed too ... but when?
*/
if (XidInMVCCSnapshot(HeapTupleHeaderGetXmax(tuple), snapshot))
return true; /* treat as still in progress */
1999-05-25 18:15:34 +02:00
1998-12-16 12:53:55 +01:00
return false;
}
/*
* HeapTupleSatisfiesVacuum
*
2002-01-17 00:51:56 +01:00
* Determine the status of tuples for VACUUM purposes. Here, what
* we mainly want to know is if a tuple is potentially visible to *any*
* running transaction. If so, it can't be removed yet by VACUUM.
*
* OldestXmin is a cutoff XID (obtained from GetOldestXmin()). Tuples
* deleted by XIDs >= OldestXmin are deemed "recently dead"; they might
* still be visible to some open transaction, so we can't remove them,
* even if we see that the deleting transaction has committed.
*/
HTSV_Result
HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
Buffer buffer)
{
/*
* Has inserting transaction committed?
*
* If the inserting transaction aborted, then the tuple was never visible
* to any other transaction, so we can delete it immediately.
*/
if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return HEAPTUPLE_DEAD;
else if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return HEAPTUPLE_DELETE_IN_PROGRESS;
if (TransactionIdIsInProgress(xvac))
return HEAPTUPLE_DELETE_IN_PROGRESS;
if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return HEAPTUPLE_DEAD;
}
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
2003-08-04 02:43:34 +02:00
if (TransactionIdIsCurrentTransactionId(xvac))
return HEAPTUPLE_INSERT_IN_PROGRESS;
if (TransactionIdIsInProgress(xvac))
return HEAPTUPLE_INSERT_IN_PROGRESS;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return HEAPTUPLE_DEAD;
}
}
else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
{
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
return HEAPTUPLE_INSERT_IN_PROGRESS;
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HEAPTUPLE_INSERT_IN_PROGRESS;
/* inserted and then deleted by same xact */
return HEAPTUPLE_DELETE_IN_PROGRESS;
}
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetXmin(tuple));
else
{
/*
2005-10-15 04:49:52 +02:00
* Not in Progress, Not Committed, so either Aborted or crashed
*/
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return HEAPTUPLE_DEAD;
}
2007-11-15 22:14:46 +01:00
/*
* At this point the xmin is known committed, but we might not have
2007-11-15 22:14:46 +01:00
* been able to set the hint bit yet; so we can no longer Assert that
* it's set.
*/
}
/*
2005-10-15 04:49:52 +02:00
* Okay, the inserter committed, so it was good at some point. Now what
* about the deleting transaction?
*/
if (tuple->t_infomask & HEAP_XMAX_INVALID)
return HEAPTUPLE_LIVE;
if (tuple->t_infomask & HEAP_IS_LOCKED)
{
/*
2005-10-15 04:49:52 +02:00
* "Deleting" xact really only locked it, so the tuple is live in any
* case. However, we should make sure that either XMAX_COMMITTED or
2007-11-15 22:14:46 +01:00
* XMAX_INVALID gets set once the xact is gone, to reduce the costs of
* examining the tuple for future xacts. Also, marking dead
* MultiXacts as invalid here provides defense against MultiXactId
* wraparound (see also comments in heap_freeze_tuple()).
*/
if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
if (MultiXactIdIsRunning(HeapTupleHeaderGetXmax(tuple)))
return HEAPTUPLE_LIVE;
}
else
{
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return HEAPTUPLE_LIVE;
}
2003-08-04 02:43:34 +02:00
/*
2005-10-15 04:49:52 +02:00
* We don't really care whether xmax did commit, abort or crash.
* We know that xmax did lock the tuple, but it did not and will
* never actually update it.
*/
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
}
return HEAPTUPLE_LIVE;
}
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
/* MultiXacts are currently only allowed to lock tuples */
Assert(tuple->t_infomask & HEAP_IS_LOCKED);
return HEAPTUPLE_LIVE;
}
if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsInProgress(HeapTupleHeaderGetXmax(tuple)))
return HEAPTUPLE_DELETE_IN_PROGRESS;
else if (TransactionIdDidCommit(HeapTupleHeaderGetXmax(tuple)))
SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
HeapTupleHeaderGetXmax(tuple));
else
{
/*
2005-10-15 04:49:52 +02:00
* Not in Progress, Not Committed, so either Aborted or crashed
*/
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return HEAPTUPLE_LIVE;
}
2007-11-15 22:14:46 +01:00
/*
* At this point the xmax is known committed, but we might not have
2007-11-15 22:14:46 +01:00
* been able to set the hint bit yet; so we can no longer Assert that
* it's set.
*/
}
/*
* Deleter committed, but check special cases.
*/
if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple),
2002-09-04 22:31:48 +02:00
HeapTupleHeaderGetXmax(tuple)))
{
/*
2005-10-15 04:49:52 +02:00
* Inserter also deleted it, so it was never visible to anyone else.
* However, we can only remove it early if it's not an updated tuple;
* else its parent tuple is linking to it via t_ctid, and this tuple
* mustn't go away before the parent does.
*/
if (!(tuple->t_infomask & HEAP_UPDATED))
return HEAPTUPLE_DEAD;
}
if (!TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple), OldestXmin))
{
/* deleting xact is too recent, tuple could still be visible */
return HEAPTUPLE_RECENTLY_DEAD;
}
/* Otherwise, it's dead and removable */
return HEAPTUPLE_DEAD;
}
/*
* XidInMVCCSnapshot
* Is the given XID still-in-progress according to the snapshot?
*
* Note: GetSnapshotData never stores either top xid or subxids of our own
* backend into a snapshot, so these xids will not be reported as "running"
* by this function. This is OK for current uses, because we actually only
* apply this for known-committed XIDs.
*/
static bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
uint32 i;
/*
* Make a quick range check to eliminate most XIDs without looking at the
2006-10-04 02:30:14 +02:00
* xip arrays. Note that this is OK even if we convert a subxact XID to
* its parent below, because a subxact with XID < xmin has surely also got
* a parent with XID < xmin, while one with XID >= xmax must belong to a
* parent that was not yet committed at the time of this snapshot.
*/
/* Any xid < xmin is not in-progress */
if (TransactionIdPrecedes(xid, snapshot->xmin))
return false;
/* Any xid >= xmax is in-progress */
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;
/*
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
* Snapshot information is stored slightly differently in snapshots
* taken during recovery.
*/
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
if (!snapshot->takenDuringRecovery)
{
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
/*
* If the snapshot contains full subxact data, the fastest way to check
* things is just to compare the given XID against both subxact XIDs and
* top-level XIDs. If the snapshot overflowed, we have to use pg_subtrans
* to convert a subxact XID to its parent XID, but then we need only look
* at top-level XIDs not subxacts.
*/
if (!snapshot->suboverflowed)
{
/* full data, so search subxip */
int32 j;
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
for (j = 0; j < snapshot->subxcnt; j++)
{
if (TransactionIdEquals(xid, snapshot->subxip[j]))
return true;
}
/* not there, fall through to search xip[] */
}
else
{
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
/* overflowed, so convert xid to top-level */
xid = SubTransGetTopmostTransaction(xid);
/*
* If xid was indeed a subxact, we might now have an xid < xmin, so
* recheck to avoid an array scan. No point in rechecking xmax.
*/
if (TransactionIdPrecedes(xid, snapshot->xmin))
return false;
}
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
for (i = 0; i < snapshot->xcnt; i++)
{
if (TransactionIdEquals(xid, snapshot->xip[i]))
return true;
}
}
else
{
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
int32 j;
/*
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
* In recovery we store all xids in the subxact array because it
* is by far the bigger array, and we mostly don't know which xids
* are top-level and which are subxacts. The xip array is empty.
*
* We start by searching subtrans, if we overflowed.
*/
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
if (snapshot->suboverflowed)
{
/* overflowed, so convert xid to top-level */
xid = SubTransGetTopmostTransaction(xid);
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
/*
* If xid was indeed a subxact, we might now have an xid < xmin, so
* recheck to avoid an array scan. No point in rechecking xmax.
*/
if (TransactionIdPrecedes(xid, snapshot->xmin))
return false;
}
/*
* We now have either a top-level xid higher than xmin or an
* indeterminate xid. We don't know whether it's top level or subxact
* but it doesn't matter. If it's present, the xid is visible.
*/
for (j = 0; j < snapshot->subxcnt; j++)
{
if (TransactionIdEquals(xid, snapshot->subxip[j]))
return true;
}
}
return false;
}