Request XLOG switch before writing checkpoint in pg_start_backup(). Otherwise

you can end up with an unrecoverable backup if you start a new base backup
right after finishing archive recovery. In that scenario, the redo pointer of
the checkpoint that pg_start_backup() writes points to the XLOG segment where
the timeline-changing end-of-archive-recovery checkpoint is. The beginning
of that segment contains pages with the old timeline ID, and we don't accept
that in recovery unless we find a history file covering the old timeline ID.
If you omit pg_xlog from the base backup and clear the archive directory
before starting the backup, there will be no such history file available.

The bug is present in all versions since PITR was introduced in 8.0, but I'm
back-patching only back to 8.2. Earlier versions didn't have XLOG switch
records, making this fix unfeasible. Given the lack of reports until now,
it doesn't seem worthwhile to spend more effort to fix 8.0 and 8.1.

Per report and suggestion by Mikael Krantz
This commit is contained in:
Heikki Linnakangas 2009-05-07 11:25:25 +00:00
parent 1f36feceb0
commit 223431cba1
1 changed files with 14 additions and 1 deletions

View File

@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/backend/access/transam/xlog.c,v 1.336 2009/04/22 19:51:12 heikki Exp $
* $PostgreSQL: pgsql/src/backend/access/transam/xlog.c,v 1.337 2009/05/07 11:25:25 heikki Exp $
*
*-------------------------------------------------------------------------
*/
@ -6987,6 +6987,19 @@ pg_start_backup(PG_FUNCTION_ARGS)
XLogCtl->Insert.forcePageWrites = true;
LWLockRelease(WALInsertLock);
/*
* Force an XLOG file switch before the checkpoint, to ensure that the WAL
* segment the checkpoint is written to doesn't contain pages with old
* timeline IDs. That would otherwise happen if you called
* pg_start_backup() right after restoring from a PITR archive: the first
* WAL segment containing the startup checkpoint has pages in the
* beginning with the old timeline ID. That can cause trouble at recovery:
* we won't have a history file covering the old timeline if pg_xlog
* directory was not included in the base backup and the WAL archive was
* cleared too before starting the backup.
*/
RequestXLogSwitch();
/* Ensure we release forcePageWrites if fail below */
PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
{