postgresql/src/backend/access/transam/varsup.c

371 lines
13 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* varsup.c
2000-11-30 09:46:26 +01:00
* postgres OID & XID variables support routines
*
* Copyright (c) 2000-2007, PostgreSQL Global Development Group
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/transam/varsup.c,v 1.77 2007/01/05 22:19:23 momjian Exp $
*
*-------------------------------------------------------------------------
*/
2000-11-30 09:46:26 +01:00
#include "postgres.h"
1996-10-21 09:15:18 +02:00
#include "access/clog.h"
#include "access/subtrans.h"
2000-11-30 09:46:26 +01:00
#include "access/transam.h"
#include "miscadmin.h"
#include "postmaster/autovacuum.h"
#include "storage/pmsignal.h"
#include "storage/proc.h"
#include "utils/builtins.h"
/* Number of OIDs to prefetch (preallocate) per XLOG write */
XLOG (and related) changes: * Store two past checkpoint locations, not just one, in pg_control. On startup, we fall back to the older checkpoint if the newer one is unreadable. Also, a physical copy of the newest checkpoint record is kept in pg_control for possible use in disaster recovery (ie, complete loss of pg_xlog). Also add a version number for pg_control itself. Remove archdir from pg_control; it ought to be a GUC parameter, not a special case (not that it's implemented yet anyway). * Suppress successive checkpoint records when nothing has been entered in the WAL log since the last one. This is not so much to avoid I/O as to make it actually useful to keep track of the last two checkpoints. If the things are right next to each other then there's not a lot of redundancy gained... * Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs on alternate bytes. Polynomial borrowed from ECMA DLT1 standard. * Fix XLOG record length handling so that it will work at BLCKSZ = 32k. * Change XID allocation to work more like OID allocation. (This is of dubious necessity, but I think it's a good idea anyway.) * Fix a number of minor bugs, such as off-by-one logic for XLOG file wraparound at the 4 gig mark. * Add documentation and clean up some coding infelicities; move file format declarations out to include files where planned contrib utilities can get at them. * Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or every CHECKPOINT_TIMEOUT seconds, whichever comes first. It is also possible to force a checkpoint by sending SIGUSR1 to the postmaster (undocumented feature...) * Defend against kill -9 postmaster by storing shmem block's key and ID in postmaster.pid lockfile, and checking at startup to ensure that no processes are still connected to old shmem block (if it still exists). * Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency stop, for symmetry with postmaster and xlog utilities. Clean up signal handling in bootstrap.c so that xlog utilities launched by postmaster will react to signals better. * Standalone bootstrap now grabs lockfile in target directory, as added insurance against running it in parallel with live postmaster.
2001-03-13 02:17:06 +01:00
#define VAR_OID_PREFETCH 8192
2000-11-30 09:46:26 +01:00
/* pointer to "variable cache" in shared memory (set up by shmem.c) */
VariableCache ShmemVariableCache = NULL;
/*
* Allocate the next XID for my new transaction.
*/
TransactionId
GetNewTransactionId(bool isSubXact)
{
TransactionId xid;
2000-11-30 09:46:26 +01:00
/*
2001-03-22 05:01:46 +01:00
* During bootstrap initialization, we return the special bootstrap
* transaction id.
*/
if (IsBootstrapProcessingMode())
return BootstrapTransactionId;
LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
XLOG (and related) changes: * Store two past checkpoint locations, not just one, in pg_control. On startup, we fall back to the older checkpoint if the newer one is unreadable. Also, a physical copy of the newest checkpoint record is kept in pg_control for possible use in disaster recovery (ie, complete loss of pg_xlog). Also add a version number for pg_control itself. Remove archdir from pg_control; it ought to be a GUC parameter, not a special case (not that it's implemented yet anyway). * Suppress successive checkpoint records when nothing has been entered in the WAL log since the last one. This is not so much to avoid I/O as to make it actually useful to keep track of the last two checkpoints. If the things are right next to each other then there's not a lot of redundancy gained... * Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs on alternate bytes. Polynomial borrowed from ECMA DLT1 standard. * Fix XLOG record length handling so that it will work at BLCKSZ = 32k. * Change XID allocation to work more like OID allocation. (This is of dubious necessity, but I think it's a good idea anyway.) * Fix a number of minor bugs, such as off-by-one logic for XLOG file wraparound at the 4 gig mark. * Add documentation and clean up some coding infelicities; move file format declarations out to include files where planned contrib utilities can get at them. * Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or every CHECKPOINT_TIMEOUT seconds, whichever comes first. It is also possible to force a checkpoint by sending SIGUSR1 to the postmaster (undocumented feature...) * Defend against kill -9 postmaster by storing shmem block's key and ID in postmaster.pid lockfile, and checking at startup to ensure that no processes are still connected to old shmem block (if it still exists). * Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency stop, for symmetry with postmaster and xlog utilities. Clean up signal handling in bootstrap.c so that xlog utilities launched by postmaster will react to signals better. * Standalone bootstrap now grabs lockfile in target directory, as added insurance against running it in parallel with live postmaster.
2001-03-13 02:17:06 +01:00
xid = ShmemVariableCache->nextXid;
/*----------
2005-10-15 04:49:52 +02:00
* Check to see if it's safe to assign another XID. This protects against
* catastrophic data loss due to XID wraparound. The basic rules are:
*
* If we're past xidVacLimit, start trying to force autovacuum cycles.
* If we're past xidWarnLimit, start issuing warnings.
* If we're past xidStopLimit, refuse to execute transactions, unless
* we are running in a standalone backend (which gives an escape hatch
* to the DBA who somehow got past the earlier defenses).
*
* Test is coded to fall out as fast as possible during normal operation,
* ie, when the vac limit is set and we haven't violated it.
*----------
*/
if (TransactionIdFollowsOrEquals(xid, ShmemVariableCache->xidVacLimit) &&
TransactionIdIsValid(ShmemVariableCache->xidVacLimit))
{
/*
* To avoid swamping the postmaster with signals, we issue the
* autovac request only once per 64K transaction starts. This
* still gives plenty of chances before we get into real trouble.
*/
if (IsUnderPostmaster && (xid % 65536) == 0)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC);
if (IsUnderPostmaster &&
2005-10-15 04:49:52 +02:00
TransactionIdFollowsOrEquals(xid, ShmemVariableCache->xidStopLimit))
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
2005-10-29 02:31:52 +02:00
errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
NameStr(ShmemVariableCache->limit_datname)),
2005-10-29 02:31:52 +02:00
errhint("Stop the postmaster and use a standalone backend to vacuum database \"%s\".",
NameStr(ShmemVariableCache->limit_datname))));
else if (TransactionIdFollowsOrEquals(xid, ShmemVariableCache->xidWarnLimit))
ereport(WARNING,
2005-10-15 04:49:52 +02:00
(errmsg("database \"%s\" must be vacuumed within %u transactions",
NameStr(ShmemVariableCache->limit_datname),
ShmemVariableCache->xidWrapLimit - xid),
errhint("To avoid a database shutdown, execute a full-database VACUUM in \"%s\".",
NameStr(ShmemVariableCache->limit_datname))));
}
/*
2004-08-29 07:07:03 +02:00
* If we are allocating the first XID of a new page of the commit log,
2005-10-15 04:49:52 +02:00
* zero out that commit-log page before returning. We must do this while
* holding XidGenLock, else another xact could acquire and commit a later
* XID before we zero the page. Fortunately, a page of the commit log
* holds 32K or more transactions, so we don't have to do this very often.
*
* Extend pg_subtrans too.
*/
ExtendCLOG(xid);
ExtendSUBTRANS(xid);
/*
2005-10-15 04:49:52 +02:00
* Now advance the nextXid counter. This must not happen until after we
* have successfully completed ExtendCLOG() --- if that routine fails, we
* want the next incoming transaction to try it again. We cannot assign
* more XIDs until there is CLOG space for them.
*/
TransactionIdAdvance(ShmemVariableCache->nextXid);
/*
2005-10-15 04:49:52 +02:00
* We must store the new XID into the shared PGPROC array before releasing
* XidGenLock. This ensures that when GetSnapshotData calls
* ReadNewTransactionId, all active XIDs before the returned value of
2005-10-15 04:49:52 +02:00
* nextXid are already present in PGPROC. Else we have a race condition.
*
* XXX by storing xid into MyProc without acquiring ProcArrayLock, we are
* relying on fetch/store of an xid to be atomic, else other backends
* might see a partially-set xid here. But holding both locks at once
2005-10-15 04:49:52 +02:00
* would be a nasty concurrency hit (and in fact could cause a deadlock
* against GetSnapshotData). So for now, assume atomicity. Note that
* readers of PGPROC xid field should be careful to fetch the value only
* once, rather than assume they can read it multiple times and get the
* same answer each time.
*
* The same comments apply to the subxact xid count and overflow fields.
*
2005-10-15 04:49:52 +02:00
* A solution to the atomic-store problem would be to give each PGPROC its
* own spinlock used only for fetching/storing that PGPROC's xid and
* related fields.
*
* If there's no room to fit a subtransaction XID into PGPROC, set the
* cache-overflowed flag instead. This forces readers to look in
2005-10-15 04:49:52 +02:00
* pg_subtrans to map subtransaction XIDs up to top-level XIDs. There is a
* race-condition window, in that the new XID will not appear as running
* until its parent link has been placed into pg_subtrans. However, that
* will happen before anyone could possibly have a reason to inquire about
* the status of the XID, so it seems OK. (Snapshots taken during this
* window *will* include the parent XID, so they will deliver the correct
* answer later on when someone does have a reason to inquire.)
*/
if (MyProc != NULL)
{
/*
* Use volatile pointer to prevent code rearrangement; other backends
2006-10-04 02:30:14 +02:00
* could be examining my subxids info concurrently, and we don't want
* them to see an invalid intermediate state, such as incrementing
* nxids before filling the array entry. Note we are assuming that
* TransactionId and int fetch/store are atomic.
*/
volatile PGPROC *myproc = MyProc;
if (!isSubXact)
myproc->xid = xid;
else
{
2006-10-04 02:30:14 +02:00
int nxids = myproc->subxids.nxids;
if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
{
myproc->subxids.xids[nxids] = xid;
myproc->subxids.nxids = nxids + 1;
}
else
myproc->subxids.overflowed = true;
}
}
LWLockRelease(XidGenLock);
return xid;
}
/*
XLOG (and related) changes: * Store two past checkpoint locations, not just one, in pg_control. On startup, we fall back to the older checkpoint if the newer one is unreadable. Also, a physical copy of the newest checkpoint record is kept in pg_control for possible use in disaster recovery (ie, complete loss of pg_xlog). Also add a version number for pg_control itself. Remove archdir from pg_control; it ought to be a GUC parameter, not a special case (not that it's implemented yet anyway). * Suppress successive checkpoint records when nothing has been entered in the WAL log since the last one. This is not so much to avoid I/O as to make it actually useful to keep track of the last two checkpoints. If the things are right next to each other then there's not a lot of redundancy gained... * Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs on alternate bytes. Polynomial borrowed from ECMA DLT1 standard. * Fix XLOG record length handling so that it will work at BLCKSZ = 32k. * Change XID allocation to work more like OID allocation. (This is of dubious necessity, but I think it's a good idea anyway.) * Fix a number of minor bugs, such as off-by-one logic for XLOG file wraparound at the 4 gig mark. * Add documentation and clean up some coding infelicities; move file format declarations out to include files where planned contrib utilities can get at them. * Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or every CHECKPOINT_TIMEOUT seconds, whichever comes first. It is also possible to force a checkpoint by sending SIGUSR1 to the postmaster (undocumented feature...) * Defend against kill -9 postmaster by storing shmem block's key and ID in postmaster.pid lockfile, and checking at startup to ensure that no processes are still connected to old shmem block (if it still exists). * Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency stop, for symmetry with postmaster and xlog utilities. Clean up signal handling in bootstrap.c so that xlog utilities launched by postmaster will react to signals better. * Standalone bootstrap now grabs lockfile in target directory, as added insurance against running it in parallel with live postmaster.
2001-03-13 02:17:06 +01:00
* Read nextXid but don't allocate it.
*/
TransactionId
ReadNewTransactionId(void)
{
TransactionId xid;
LWLockAcquire(XidGenLock, LW_SHARED);
xid = ShmemVariableCache->nextXid;
LWLockRelease(XidGenLock);
return xid;
}
/*
* Determine the last safe XID to allocate given the currently oldest
* datfrozenxid (ie, the oldest XID that might exist in any database
* of our cluster).
*/
void
SetTransactionIdLimit(TransactionId oldest_datfrozenxid,
Name oldest_datname)
{
TransactionId xidVacLimit;
TransactionId xidWarnLimit;
TransactionId xidStopLimit;
TransactionId xidWrapLimit;
TransactionId curXid;
Assert(TransactionIdIsNormal(oldest_datfrozenxid));
/*
* The place where we actually get into deep trouble is halfway around
* from the oldest potentially-existing XID. (This calculation is
* probably off by one or two counts, because the special XIDs reduce the
* size of the loop a little bit. But we throw in plenty of slop below,
* so it doesn't matter.)
*/
xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
if (xidWrapLimit < FirstNormalTransactionId)
xidWrapLimit += FirstNormalTransactionId;
/*
2005-10-15 04:49:52 +02:00
* We'll refuse to continue assigning XIDs in interactive mode once we get
* within 1M transactions of data loss. This leaves lots of room for the
* DBA to fool around fixing things in a standalone backend, while not
* being significant compared to total XID space. (Note that since
* vacuuming requires one transaction per table cleaned, we had better be
* sure there's lots of XIDs left...)
*/
xidStopLimit = xidWrapLimit - 1000000;
if (xidStopLimit < FirstNormalTransactionId)
xidStopLimit -= FirstNormalTransactionId;
/*
2005-10-15 04:49:52 +02:00
* We'll start complaining loudly when we get within 10M transactions of
* the stop point. This is kind of arbitrary, but if you let your gas
* gauge get down to 1% of full, would you be looking for the next gas
* station? We need to be fairly liberal about this number because there
* are lots of scenarios where most transactions are done by automatic
* clients that won't pay attention to warnings. (No, we're not gonna make
* this configurable. If you know enough to configure it, you know enough
* to not get in this kind of trouble in the first place.)
*/
xidWarnLimit = xidStopLimit - 10000000;
if (xidWarnLimit < FirstNormalTransactionId)
xidWarnLimit -= FirstNormalTransactionId;
/*
* We'll start trying to force autovacuums when oldest_datfrozenxid
* gets to be more than autovacuum_freeze_max_age transactions old.
*
* Note: guc.c ensures that autovacuum_freeze_max_age is in a sane
* range, so that xidVacLimit will be well before xidWarnLimit.
*
* Note: autovacuum_freeze_max_age is a PGC_POSTMASTER parameter so that
* we don't have to worry about dealing with on-the-fly changes in its
* value. It doesn't look practical to update shared state from a GUC
* assign hook (too many processes would try to execute the hook,
* resulting in race conditions as well as crashes of those not
* connected to shared memory). Perhaps this can be improved someday.
*/
xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age;
if (xidVacLimit < FirstNormalTransactionId)
xidVacLimit += FirstNormalTransactionId;
/* Grab lock for just long enough to set the new limit values */
LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
ShmemVariableCache->oldestXid = oldest_datfrozenxid;
ShmemVariableCache->xidVacLimit = xidVacLimit;
ShmemVariableCache->xidWarnLimit = xidWarnLimit;
ShmemVariableCache->xidStopLimit = xidStopLimit;
ShmemVariableCache->xidWrapLimit = xidWrapLimit;
namecpy(&ShmemVariableCache->limit_datname, oldest_datname);
curXid = ShmemVariableCache->nextXid;
LWLockRelease(XidGenLock);
/* Log the info */
ereport(DEBUG1,
2005-10-15 04:49:52 +02:00
(errmsg("transaction ID wrap limit is %u, limited by database \"%s\"",
xidWrapLimit, NameStr(*oldest_datname))));
/*
* If past the autovacuum force point, immediately signal an autovac
* request. The reason for this is that autovac only processes one
* database per invocation. Once it's finished cleaning up the oldest
* database, it'll call here, and we'll signal the postmaster to start
* another iteration immediately if there are still any old databases.
*/
if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&
IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC);
/* Give an immediate warning if past the wrap warn point */
if (TransactionIdFollowsOrEquals(curXid, xidWarnLimit))
ereport(WARNING,
2005-10-15 04:49:52 +02:00
(errmsg("database \"%s\" must be vacuumed within %u transactions",
NameStr(*oldest_datname),
xidWrapLimit - curXid),
errhint("To avoid a database shutdown, execute a full-database VACUUM in \"%s\".",
NameStr(*oldest_datname))));
}
/*
* GetNewObjectId -- allocate a new OID
*
* OIDs are generated by a cluster-wide counter. Since they are only 32 bits
* wide, counter wraparound will occur eventually, and therefore it is unwise
* to assume they are unique unless precautions are taken to make them so.
* Hence, this routine should generally not be used directly. The only
* direct callers should be GetNewOid() and GetNewRelFileNode() in
* catalog/catalog.c.
*/
Oid
GetNewObjectId(void)
{
Oid result;
LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
/*
* Check for wraparound of the OID counter. We *must* not return 0
* (InvalidOid); and as long as we have to check that, it seems a good
* idea to skip over everything below FirstNormalObjectId too. (This
* basically just avoids lots of collisions with bootstrap-assigned OIDs
* right after a wrap occurs, so as to avoid a possibly large number of
* iterations in GetNewOid.) Note we are relying on unsigned comparison.
*
* During initdb, we start the OID generator at FirstBootstrapObjectId, so
* we only enforce wrapping to that point when in bootstrap or standalone
2005-10-15 04:49:52 +02:00
* mode. The first time through this routine after normal postmaster
* start, the counter will be forced up to FirstNormalObjectId. This
* mechanism leaves the OIDs between FirstBootstrapObjectId and
* FirstNormalObjectId available for automatic assignment during initdb,
* while ensuring they will never conflict with user-assigned OIDs.
*/
if (ShmemVariableCache->nextOid < ((Oid) FirstNormalObjectId))
{
if (IsPostmasterEnvironment)
{
/* wraparound in normal environment */
ShmemVariableCache->nextOid = FirstNormalObjectId;
ShmemVariableCache->oidCount = 0;
}
else
{
/* we may be bootstrapping, so don't enforce the full range */
if (ShmemVariableCache->nextOid < ((Oid) FirstBootstrapObjectId))
{
/* wraparound in standalone environment? */
ShmemVariableCache->nextOid = FirstBootstrapObjectId;
ShmemVariableCache->oidCount = 0;
}
}
}
XLOG (and related) changes: * Store two past checkpoint locations, not just one, in pg_control. On startup, we fall back to the older checkpoint if the newer one is unreadable. Also, a physical copy of the newest checkpoint record is kept in pg_control for possible use in disaster recovery (ie, complete loss of pg_xlog). Also add a version number for pg_control itself. Remove archdir from pg_control; it ought to be a GUC parameter, not a special case (not that it's implemented yet anyway). * Suppress successive checkpoint records when nothing has been entered in the WAL log since the last one. This is not so much to avoid I/O as to make it actually useful to keep track of the last two checkpoints. If the things are right next to each other then there's not a lot of redundancy gained... * Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs on alternate bytes. Polynomial borrowed from ECMA DLT1 standard. * Fix XLOG record length handling so that it will work at BLCKSZ = 32k. * Change XID allocation to work more like OID allocation. (This is of dubious necessity, but I think it's a good idea anyway.) * Fix a number of minor bugs, such as off-by-one logic for XLOG file wraparound at the 4 gig mark. * Add documentation and clean up some coding infelicities; move file format declarations out to include files where planned contrib utilities can get at them. * Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or every CHECKPOINT_TIMEOUT seconds, whichever comes first. It is also possible to force a checkpoint by sending SIGUSR1 to the postmaster (undocumented feature...) * Defend against kill -9 postmaster by storing shmem block's key and ID in postmaster.pid lockfile, and checking at startup to ensure that no processes are still connected to old shmem block (if it still exists). * Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency stop, for symmetry with postmaster and xlog utilities. Clean up signal handling in bootstrap.c so that xlog utilities launched by postmaster will react to signals better. * Standalone bootstrap now grabs lockfile in target directory, as added insurance against running it in parallel with live postmaster.
2001-03-13 02:17:06 +01:00
/* If we run out of logged for use oids then we must log more */
2000-11-30 09:46:26 +01:00
if (ShmemVariableCache->oidCount == 0)
{
2000-11-30 09:46:26 +01:00
XLogPutNextOid(ShmemVariableCache->nextOid + VAR_OID_PREFETCH);
ShmemVariableCache->oidCount = VAR_OID_PREFETCH;
}
result = ShmemVariableCache->nextOid;
2000-11-30 09:46:26 +01:00
(ShmemVariableCache->nextOid)++;
(ShmemVariableCache->oidCount)--;
LWLockRelease(OidGenLock);
return result;
}