Protect against multixact members wraparound

Multixact member files are subject to early wraparound overflow and
removal: if the average multixact size is above a certain threshold (see
note below) the protections against offset overflow are not enough:
during multixact truncation at checkpoint time, some
pg_multixact/members files would be removed because the server considers
them to be old and not needed anymore.  This leads to loss of files that
are critical to interpret existing tuples's Xmax values.

To protect against this, since we don't have enough info in pg_control
and we can't modify it in old branches, we maintain shared memory state
about the oldest value that we need to keep; we use this during new
multixact creation to abort if an old still-needed file would get
overwritten.  This value is kept up to date by checkpoints, which makes
it not completely accurate but should be good enough.  We start emitting
warnings sometime earlier, so that the eventual multixact-shutdown
doesn't take DBAs completely by surprise (more precisely: once 20
members SLRU segments are remaining before shutdown.)

On troublesome average multixact size: The threshold size depends on the
multixact freeze parameters. The oldest age is related to the greater of
multixact_freeze_table_age and multixact_freeze_min_age: anything
older than that should be removed promptly by autovacuum.  If autovacuum
is keeping up with multixact freezing, the troublesome multixact average
size is
	(2^32-1) / Max(freeze table age, freeze min age)
or around 28 members per multixact.  Having an average multixact size
larger than that will eventually cause new multixact data to overwrite
the data area for older multixacts.  (If autovacuum is not able to keep
up, or there are errors in vacuuming, the actual maximum is
multixact_freeeze_max_age instead, at which point multixact generation
is stopped completely.  The default value for this limit is 400 million,
which means that the multixact size that would cause trouble is about 10
members).

Initial bug report by Timothy Garnett, bug #12990
Backpatch to 9.3, where the problem was introduced.

Authors: Álvaro Herrera, Thomas Munro
Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner
This commit is contained in:
Alvaro Herrera 2015-04-28 11:32:53 -03:00
parent dfbaed4597
commit b69bf30b9b
2 changed files with 187 additions and 25 deletions

View File

@ -213,6 +213,9 @@ typedef struct MultiXactStateData
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
/* support for members anti-wraparound measures */
MultiXactOffset offsetStopLimit;
/*
* Per-backend data starts here. We have two arrays stored in the area
* immediately following the MultiXactStateData struct. Each is indexed by
@ -341,6 +344,10 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
MultiXactOffset offset2);
static void ExtendMultiXactOffset(MultiXactId multi);
static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
static void DetermineSafeOldestOffset(MultiXactId oldestMXact);
static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
MultiXactOffset start, uint32 distance);
static MultiXactOffset read_offset_for_multi(MultiXactId multi);
static void WriteMZeroPageXlogRec(int pageno, uint8 info);
@ -967,7 +974,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
/*
* To avoid swamping the postmaster with signals, we issue the autovac
* request only once per 64K transaction starts. This still gives
* request only once per 64K multis generated. This still gives
* plenty of chances before we get into real trouble.
*/
if (IsUnderPostmaster && (result % 65536) == 0)
@ -1043,6 +1050,47 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
else
*offset = nextOffset;
/*----------
* Protect against overrun of the members space as well, with the
* following rules:
*
* If we're past offsetStopLimit, refuse to generate more multis.
* If we're close to offsetStopLimit, emit a warning.
*
* Arbitrarily, we start emitting warnings when we're 20 segments or less
* from offsetStopLimit.
*
* Note we haven't updated the shared state yet, so if we fail at this
* point, the multixact ID we grabbed can still be used by the next guy.
*
* Note that there is no point in forcing autovacuum runs here: the
* multixact freeze settings would have to be reduced for that to have any
* effect.
*----------
*/
#define OFFSET_WARN_SEGMENTS 20
if (MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
nmembers))
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("multixact \"members\" limit exceeded"),
errdetail_plural("This command would create a multixact with %u members, which exceeds remaining space (%u member.)",
"This command would create a multixact with %u members, which exceeds remaining space (%u members.)",
MultiXactState->offsetStopLimit - nextOffset - 1,
nmembers,
MultiXactState->offsetStopLimit - nextOffset - 1),
errhint("Execute a database-wide VACUUM in database with OID %u, with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings.",
MultiXactState->oldestMultiXactDB)));
else if (MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
nextOffset,
nmembers + MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT * OFFSET_WARN_SEGMENTS))
ereport(WARNING,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database with OID %u must be vacuumed before %d more multixact members are used",
MultiXactState->oldestMultiXactDB,
MultiXactState->offsetStopLimit - nextOffset + nmembers),
errhint("Execute a database-wide VACUUM in that database, with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings.")));
ExtendMultiXactMember(nextOffset, nmembers);
/*
@ -1899,6 +1947,12 @@ StartupMultiXact(void)
*/
pageno = MXOffsetToMemberPage(offset);
MultiXactMemberCtl->shared->latest_page_number = pageno;
/*
* compute the oldest member we need to keep around to avoid old member
* data overrun.
*/
DetermineSafeOldestOffset(MultiXactState->oldestMultiXactId);
}
/*
@ -1992,6 +2046,8 @@ TrimMultiXact(void)
}
LWLockRelease(MultiXactMemberControlLock);
DetermineSafeOldestOffset(MultiXactState->oldestMultiXactId);
}
/*
@ -2099,7 +2155,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
*
* Note: This differs from the magic number used in
* SetTransactionIdLimit() since vacuum itself will never generate new
* multis.
* multis. XXX actually it does, if it needs to freeze old multis.
*/
multiStopLimit = multiWrapLimit - 100;
if (multiStopLimit < FirstMultiXactId)
@ -2142,6 +2198,8 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
curMulti = MultiXactState->nextMXact;
LWLockRelease(MultiXactGenLock);
DetermineSafeOldestOffset(oldest_datminmxid);
/* Log the info */
ereport(DEBUG1,
(errmsg("MultiXactId wrap limit is %u, limited by database with OID %u",
@ -2228,13 +2286,16 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
/*
* Update our oldestMultiXactId value, but only if it's more recent than
* what we had.
* what we had. However, even if not, always update the oldest multixact
* offset limit.
*/
void
MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB)
{
if (MultiXactIdPrecedes(MultiXactState->oldestMultiXactId, oldestMulti))
SetMultiXactIdLimit(oldestMulti, oldestMultiDB);
else
DetermineSafeOldestOffset(oldestMulti);
}
/*
@ -2401,6 +2462,121 @@ GetOldestMultiXactId(void)
return oldestMXact;
}
/*
* Based on the given oldest MultiXactId, determine what's the oldest member
* offset and install the limit info in MultiXactState, where it can be used to
* prevent overrun of old data in the members SLRU area.
*/
static void
DetermineSafeOldestOffset(MultiXactId oldestMXact)
{
MultiXactOffset oldestOffset;
/*
* Can't do this while initdb'ing or in the startup process while
* replaying WAL: the segment file to read might have not yet been
* created, or already been removed.
*/
if (IsBootstrapProcessingMode() || InRecovery)
return;
/*
* We determine the safe upper bound for offsets of new xacts by reading
* the offset of the oldest multixact, and going back one segment. This
* way, the sequence of multixact member segments will always have a
* one-segment hole at a minimum. We start spewing warnings a few
* complete segments before that.
*/
oldestOffset = read_offset_for_multi(oldestMXact);
/* move back to start of the corresponding segment */
oldestOffset -= oldestOffset / MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT;
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
/* always leave one segment before the wraparound point */
MultiXactState->offsetStopLimit = oldestOffset -
(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
LWLockRelease(MultiXactGenLock);
}
/*
* Return whether adding "distance" to "start" would move past "boundary".
*
* We use this to determine whether the addition is "wrapping around" the
* boundary point, hence the name. The reason we don't want to use the regular
* 2^31-modulo arithmetic here is that we want to be able to use the whole of
* the 2^32-1 space here, allowing for more multixacts that would fit
* otherwise. See also SlruScanDirCbRemoveMembers.
*/
static bool
MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start,
uint32 distance)
{
MultiXactOffset finish;
Assert(distance >= 0);
/*
* Note that offset number 0 is not used (see GetMultiXactIdMembers), so
* if the addition wraps around the UINT_MAX boundary, skip that value.
*/
finish = start + distance;
if (finish < start)
finish++;
/*-----------------------------------------------------------------------
* When the boundary is numerically greater than the starting point, any
* value numerically between the two is not wrapped:
*
* <----S----B---->
* [---) = F wrapped past B (and UINT_MAX)
* [---) = F not wrapped
* [----] = F wrapped past B
*
* When the boundary is numerically less than the starting point (i.e. the
* UINT_MAX wraparound occurs somewhere in between) then all values in
* between are wrapped:
*
* <----B----S---->
* [---) = F not wrapped past B (but wrapped past UINT_MAX)
* [---) = F wrapped past B (and UINT_MAX)
* [----] = F not wrapped
*-----------------------------------------------------------------------
*/
if (start < boundary)
{
return finish >= boundary || finish < start;
}
else
{
return finish >= boundary && finish < start;
}
}
/*
* Read the offset of the first member of the given multixact.
*/
static MultiXactOffset
read_offset_for_multi(MultiXactId multi)
{
MultiXactOffset offset;
int pageno;
int entryno;
int slotno;
MultiXactOffset *offptr;
pageno = MultiXactIdToOffsetPage(multi);
entryno = MultiXactIdToOffsetEntry(multi);
/* lock is acquired by SimpleLruReadPage_ReadOnly */
slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
LWLockRelease(MultiXactOffsetControlLock);
return offset;
}
/*
* SlruScanDirectory callback.
* This callback deletes segments that are outside the range determined by
@ -2533,26 +2709,7 @@ TruncateMultiXact(void)
* First, compute the safe truncation point for MultiXactMember. This is
* the starting offset of the oldest multixact.
*/
{
int pageno;
int slotno;
int entryno;
MultiXactOffset *offptr;
/* lock is acquired by SimpleLruReadPage_ReadOnly */
pageno = MultiXactIdToOffsetPage(oldestMXact);
entryno = MultiXactIdToOffsetEntry(oldestMXact);
slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno,
oldestMXact);
offptr = (MultiXactOffset *)
MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
oldestOffset = *offptr;
LWLockRelease(MultiXactOffsetControlLock);
}
oldestOffset = read_offset_for_multi(oldestMXact);
/*
* To truncate MultiXactMembers, we need to figure out the active page

View File

@ -397,6 +397,12 @@ AuxiliaryProcessMain(int argc, char *argv[])
proc_exit(1); /* should never return */
case BootstrapProcess:
/*
* There was a brief instant during which mode was Normal; this is
* okay. We need to be in bootstrap mode during BootStrapXLOG for
* the sake of multixact initialization.
*/
SetProcessingMode(BootstrapProcessing);
bootstrap_signals();
BootStrapXLOG();
BootstrapModeMain();
@ -459,8 +465,7 @@ BootstrapModeMain(void)
int i;
Assert(!IsUnderPostmaster);
SetProcessingMode(BootstrapProcessing);
Assert(IsBootstrapProcessingMode());
/*
* Do backend-like initialization for bootstrap mode