2012-10-02 12:37:19 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* timeline.c
|
|
|
|
* Functions for reading and writing timeline history files.
|
|
|
|
*
|
|
|
|
* A timeline history file lists the timeline changes of the timeline, in
|
|
|
|
* a simple text format. They are archived along with the WAL segments.
|
|
|
|
*
|
2012-10-03 08:08:13 +02:00
|
|
|
* The files are named like "<tli>.history". For example, if the database
|
|
|
|
* starts up and switches to timeline 5, the timeline history file would be
|
|
|
|
* called "00000005.history".
|
2012-10-02 12:37:19 +02:00
|
|
|
*
|
|
|
|
* Each line in the file represents a timeline switch:
|
|
|
|
*
|
2012-12-04 14:28:58 +01:00
|
|
|
* <parentTLI> <switchpoint> <reason>
|
2012-10-02 12:37:19 +02:00
|
|
|
*
|
|
|
|
* parentTLI ID of the parent timeline
|
2017-05-12 19:51:27 +02:00
|
|
|
* switchpoint XLogRecPtr of the WAL location where the switch happened
|
2012-10-02 12:37:19 +02:00
|
|
|
* reason human-readable explanation of why the timeline was changed
|
|
|
|
*
|
|
|
|
* The fields are separated by tabs. Lines beginning with # are comments, and
|
|
|
|
* are ignored. Empty lines are also ignored.
|
|
|
|
*
|
2022-01-08 01:04:57 +01:00
|
|
|
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
|
2012-10-02 12:37:19 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* src/backend/access/transam/timeline.c
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2012-10-02 16:19:52 +02:00
|
|
|
#include <sys/stat.h>
|
2012-10-02 12:37:19 +02:00
|
|
|
#include <unistd.h>
|
|
|
|
|
|
|
|
#include "access/timeline.h"
|
2014-11-06 13:24:40 +01:00
|
|
|
#include "access/xlog.h"
|
2012-10-02 12:37:19 +02:00
|
|
|
#include "access/xlog_internal.h"
|
2020-03-31 08:33:04 +02:00
|
|
|
#include "access/xlogarchive.h"
|
2012-10-02 12:37:19 +02:00
|
|
|
#include "access/xlogdefs.h"
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
#include "pgstat.h"
|
2012-10-02 12:37:19 +02:00
|
|
|
#include "storage/fd.h"
|
|
|
|
|
Fix more issues with cascading replication and timeline switches.
When a standby server follows the master using WAL archive, and it chooses
a new timeline (recovery_target_timeline='latest'), it only fetches the
timeline history file for the chosen target timeline, not any other history
files that might be missing from pg_xlog. For example, if the current
timeline is 2, and we choose 4 as the new recovery target timeline, the
history file for timeline 3 is not fetched, even if it's part of this
server's history. That's enough for the standby itself - the history file
for timeline 4 includes timeline 3 as well - but if a cascading standby
server wants to recover to timeline 3, it needs the history file. To fix,
when a new recovery target timeline is chosen, try to copy any missing
history files from the archive to pg_xlog between the old and new target
timeline.
A second similar issue was with the WAL files. When a standby recovers from
archive, and it reaches a segment that contains a switch to a new timeline,
recovery fetches only the WAL file labelled with the new timeline's ID. The
file from the new timeline contains a copy of the WAL from the old timeline
up to the point where the switch happened, and recovery recovers it from the
new file. But in streaming replication, walsender only tries to read it
from the old timeline's file. To fix, change walsender to read it from the
new file, so that it behaves the same as recovery in that sense, and doesn't
try to open the possibly nonexistent file with the old timeline's ID.
2013-01-23 09:01:04 +01:00
|
|
|
/*
|
|
|
|
* Copies all timeline history files with id's between 'begin' and 'end'
|
2016-10-20 17:24:37 +02:00
|
|
|
* from archive to pg_wal.
|
Fix more issues with cascading replication and timeline switches.
When a standby server follows the master using WAL archive, and it chooses
a new timeline (recovery_target_timeline='latest'), it only fetches the
timeline history file for the chosen target timeline, not any other history
files that might be missing from pg_xlog. For example, if the current
timeline is 2, and we choose 4 as the new recovery target timeline, the
history file for timeline 3 is not fetched, even if it's part of this
server's history. That's enough for the standby itself - the history file
for timeline 4 includes timeline 3 as well - but if a cascading standby
server wants to recover to timeline 3, it needs the history file. To fix,
when a new recovery target timeline is chosen, try to copy any missing
history files from the archive to pg_xlog between the old and new target
timeline.
A second similar issue was with the WAL files. When a standby recovers from
archive, and it reaches a segment that contains a switch to a new timeline,
recovery fetches only the WAL file labelled with the new timeline's ID. The
file from the new timeline contains a copy of the WAL from the old timeline
up to the point where the switch happened, and recovery recovers it from the
new file. But in streaming replication, walsender only tries to read it
from the old timeline's file. To fix, change walsender to read it from the
new file, so that it behaves the same as recovery in that sense, and doesn't
try to open the possibly nonexistent file with the old timeline's ID.
2013-01-23 09:01:04 +01:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
restoreTimeLineHistoryFiles(TimeLineID begin, TimeLineID end)
|
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
|
|
|
TimeLineID tli;
|
|
|
|
|
|
|
|
for (tli = begin; tli < end; tli++)
|
|
|
|
{
|
|
|
|
if (tli == 1)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
TLHistoryFileName(histfname, tli);
|
|
|
|
if (RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false))
|
|
|
|
KeepFileRestoredFromArchive(path, histfname);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-10-02 12:37:19 +02:00
|
|
|
/*
|
|
|
|
* Try to read a timeline's history file.
|
|
|
|
*
|
|
|
|
* If successful, return the list of component TLIs (the given TLI followed by
|
|
|
|
* its ancestor TLIs). If we can't find the history file, assume that the
|
|
|
|
* timeline has no parents, and return a list of just the specified timeline
|
|
|
|
* ID.
|
|
|
|
*/
|
|
|
|
List *
|
|
|
|
readTimeLineHistory(TimeLineID targetTLI)
|
|
|
|
{
|
|
|
|
List *result;
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
|
|
|
FILE *fd;
|
2012-12-04 14:28:58 +01:00
|
|
|
TimeLineHistoryEntry *entry;
|
|
|
|
TimeLineID lasttli = 0;
|
|
|
|
XLogRecPtr prevend;
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
bool fromArchive = false;
|
2012-10-02 12:37:19 +02:00
|
|
|
|
|
|
|
/* Timeline 1 does not have a history file, so no need to check */
|
|
|
|
if (targetTLI == 1)
|
2012-12-04 14:28:58 +01:00
|
|
|
{
|
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = targetTLI;
|
|
|
|
entry->begin = entry->end = InvalidXLogRecPtr;
|
|
|
|
return list_make1(entry);
|
|
|
|
}
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2013-03-07 11:18:41 +01:00
|
|
|
if (ArchiveRecoveryRequested)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, targetTLI);
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
fromArchive =
|
|
|
|
RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
TLHistoryFilePath(path, targetTLI);
|
|
|
|
|
|
|
|
fd = AllocateFile(path, "r");
|
|
|
|
if (fd == NULL)
|
|
|
|
{
|
|
|
|
if (errno != ENOENT)
|
|
|
|
ereport(FATAL,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open file \"%s\": %m", path)));
|
|
|
|
/* Not there, so assume no parents */
|
2012-12-04 14:28:58 +01:00
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = targetTLI;
|
|
|
|
entry->begin = entry->end = InvalidXLogRecPtr;
|
|
|
|
return list_make1(entry);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
result = NIL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Parse the file...
|
|
|
|
*/
|
2012-12-04 14:28:58 +01:00
|
|
|
prevend = InvalidXLogRecPtr;
|
2020-05-08 03:36:40 +02:00
|
|
|
for (;;)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
2020-05-08 03:36:40 +02:00
|
|
|
char fline[MAXPGPATH];
|
|
|
|
char *res;
|
2012-10-02 12:37:19 +02:00
|
|
|
char *ptr;
|
|
|
|
TimeLineID tli;
|
2012-12-04 14:28:58 +01:00
|
|
|
uint32 switchpoint_hi;
|
|
|
|
uint32 switchpoint_lo;
|
|
|
|
int nfields;
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2020-05-08 03:36:40 +02:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_READ);
|
|
|
|
res = fgets(fline, sizeof(fline), fd);
|
|
|
|
pgstat_report_wait_end();
|
|
|
|
if (res == NULL)
|
|
|
|
{
|
|
|
|
if (ferror(fd))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not read file \"%s\": %m", path)));
|
|
|
|
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* skip leading whitespace and check for # comment */
|
2012-10-02 12:37:19 +02:00
|
|
|
for (ptr = fline; *ptr; ptr++)
|
|
|
|
{
|
|
|
|
if (!isspace((unsigned char) *ptr))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (*ptr == '\0' || *ptr == '#')
|
|
|
|
continue;
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
nfields = sscanf(fline, "%u\t%X/%X", &tli, &switchpoint_hi, &switchpoint_lo);
|
|
|
|
|
|
|
|
if (nfields < 1)
|
|
|
|
{
|
|
|
|
/* expect a numeric timeline ID as first field of line */
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("syntax error in history file: %s", fline),
|
|
|
|
errhint("Expected a numeric timeline ID.")));
|
2012-12-04 14:28:58 +01:00
|
|
|
}
|
|
|
|
if (nfields != 3)
|
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("syntax error in history file: %s", fline),
|
2017-05-12 17:49:56 +02:00
|
|
|
errhint("Expected a write-ahead log switchpoint location.")));
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
if (result && tli <= lasttli)
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("invalid data in history file: %s", fline),
|
|
|
|
errhint("Timeline IDs must be in increasing sequence.")));
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
lasttli = tli;
|
|
|
|
|
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = tli;
|
|
|
|
entry->begin = prevend;
|
|
|
|
entry->end = ((uint64) (switchpoint_hi)) << 32 | (uint64) switchpoint_lo;
|
|
|
|
prevend = entry->end;
|
|
|
|
|
2012-10-02 12:37:19 +02:00
|
|
|
/* Build list with newest item first */
|
2012-12-04 14:28:58 +01:00
|
|
|
result = lcons(entry, result);
|
2012-10-02 12:37:19 +02:00
|
|
|
|
|
|
|
/* we ignore the remainder of each line */
|
|
|
|
}
|
|
|
|
|
|
|
|
FreeFile(fd);
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
if (result && targetTLI <= lasttli)
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("invalid data in history file \"%s\"", path),
|
|
|
|
errhint("Timeline IDs must be less than child timeline's ID.")));
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
/*
|
|
|
|
* Create one more entry for the "tip" of the timeline, which has no entry
|
|
|
|
* in the history file.
|
|
|
|
*/
|
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = targetTLI;
|
|
|
|
entry->begin = prevend;
|
|
|
|
entry->end = InvalidXLogRecPtr;
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
result = lcons(entry, result);
|
2012-10-02 12:37:19 +02:00
|
|
|
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
/*
|
2016-10-20 17:24:37 +02:00
|
|
|
* If the history file was fetched from archive, save it in pg_wal for
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
* future reference.
|
|
|
|
*/
|
|
|
|
if (fromArchive)
|
|
|
|
KeepFileRestoredFromArchive(path, histfname);
|
|
|
|
|
2012-10-02 12:37:19 +02:00
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Probe whether a timeline history file exists for the given timeline ID
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
existsTimeLineHistory(TimeLineID probeTLI)
|
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
|
|
|
FILE *fd;
|
|
|
|
|
|
|
|
/* Timeline 1 does not have a history file, so no need to check */
|
|
|
|
if (probeTLI == 1)
|
|
|
|
return false;
|
|
|
|
|
2013-03-07 11:18:41 +01:00
|
|
|
if (ArchiveRecoveryRequested)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, probeTLI);
|
2012-11-19 09:02:25 +01:00
|
|
|
RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
TLHistoryFilePath(path, probeTLI);
|
|
|
|
|
|
|
|
fd = AllocateFile(path, "r");
|
|
|
|
if (fd != NULL)
|
|
|
|
{
|
|
|
|
FreeFile(fd);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
if (errno != ENOENT)
|
|
|
|
ereport(FATAL,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open file \"%s\": %m", path)));
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the newest existing timeline, assuming that startTLI exists.
|
|
|
|
*
|
|
|
|
* Note: while this is somewhat heuristic, it does positively guarantee
|
|
|
|
* that (result + 1) is not a known timeline, and therefore it should
|
|
|
|
* be safe to assign that ID to a new timeline.
|
|
|
|
*/
|
|
|
|
TimeLineID
|
|
|
|
findNewestTimeLine(TimeLineID startTLI)
|
|
|
|
{
|
|
|
|
TimeLineID newestTLI;
|
|
|
|
TimeLineID probeTLI;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The algorithm is just to probe for the existence of timeline history
|
|
|
|
* files. XXX is it useful to allow gaps in the sequence?
|
|
|
|
*/
|
|
|
|
newestTLI = startTLI;
|
|
|
|
|
|
|
|
for (probeTLI = startTLI + 1;; probeTLI++)
|
|
|
|
{
|
|
|
|
if (existsTimeLineHistory(probeTLI))
|
|
|
|
{
|
|
|
|
newestTLI = probeTLI; /* probeTLI exists */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* doesn't exist, assume we're done */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return newestTLI;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a new timeline history file.
|
|
|
|
*
|
|
|
|
* newTLI: ID of the new timeline
|
|
|
|
* parentTLI: ID of its immediate parent
|
2017-05-12 19:51:27 +02:00
|
|
|
* switchpoint: WAL location where the system switched to the new timeline
|
2012-10-02 12:37:19 +02:00
|
|
|
* reason: human-readable explanation of why the timeline was switched
|
|
|
|
*
|
|
|
|
* Currently this is only used at the end recovery, and so there are no locking
|
|
|
|
* considerations. But we should be just as tense as XLogFileInit to avoid
|
|
|
|
* emplacing a bogus file.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
|
2012-12-04 14:28:58 +01:00
|
|
|
XLogRecPtr switchpoint, char *reason)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char tmppath[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
|
|
|
char buffer[BLCKSZ];
|
|
|
|
int srcfd;
|
|
|
|
int fd;
|
|
|
|
int nbytes;
|
|
|
|
|
|
|
|
Assert(newTLI > parentTLI); /* else bad selection of newTLI */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Write into a temp file name.
|
|
|
|
*/
|
|
|
|
snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
|
|
|
|
|
|
|
|
unlink(tmppath);
|
|
|
|
|
|
|
|
/* do not use get_sync_bit() here --- want to fsync only at end of fill */
|
2017-09-23 15:49:22 +02:00
|
|
|
fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL);
|
2012-10-02 12:37:19 +02:00
|
|
|
if (fd < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If a history file exists for the parent, copy it verbatim
|
|
|
|
*/
|
2013-03-07 11:18:41 +01:00
|
|
|
if (ArchiveRecoveryRequested)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, parentTLI);
|
2012-11-19 09:02:25 +01:00
|
|
|
RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
TLHistoryFilePath(path, parentTLI);
|
|
|
|
|
2017-09-23 15:49:22 +02:00
|
|
|
srcfd = OpenTransientFile(path, O_RDONLY);
|
2012-10-02 12:37:19 +02:00
|
|
|
if (srcfd < 0)
|
|
|
|
{
|
|
|
|
if (errno != ENOENT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open file \"%s\": %m", path)));
|
|
|
|
/* Not there, so assume parent has no parents */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
errno = 0;
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_READ);
|
2012-10-02 12:37:19 +02:00
|
|
|
nbytes = (int) read(srcfd, buffer, sizeof(buffer));
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2012-10-02 12:37:19 +02:00
|
|
|
if (nbytes < 0 || errno != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not read file \"%s\": %m", path)));
|
|
|
|
if (nbytes == 0)
|
|
|
|
break;
|
|
|
|
errno = 0;
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_WRITE);
|
2012-10-02 12:37:19 +02:00
|
|
|
if ((int) write(fd, buffer, nbytes) != nbytes)
|
|
|
|
{
|
|
|
|
int save_errno = errno;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we fail to make the file, delete it to release disk
|
|
|
|
* space
|
|
|
|
*/
|
|
|
|
unlink(tmppath);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if write didn't set errno, assume problem is no disk space
|
|
|
|
*/
|
|
|
|
errno = save_errno ? save_errno : ENOSPC;
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not write to file \"%s\": %m", tmppath)));
|
|
|
|
}
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
Tighten use of OpenTransientFile and CloseTransientFile
This fixes two sets of issues related to the use of transient files in
the backend:
1) OpenTransientFile() has been used in some code paths with read-write
flags while read-only is sufficient, so switch those calls to be
read-only where necessary. These have been reported by Joe Conway.
2) When opening transient files, it is up to the caller to close the
file descriptors opened. In error code paths, CloseTransientFile() gets
called to clean up things before issuing an error. However in normal
exit paths, a lot of callers of CloseTransientFile() never actually
reported errors, which could leave a file descriptor open without
knowing about it. This is an issue I complained about a couple of
times, but never had the courage to write and submit a patch, so here we
go.
Note that one frontend code path is impacted by this commit so as an
error is issued when fetching control file data, making backend and
frontend to be treated consistently.
Reported-by: Joe Conway, Michael Paquier
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Georgios Kokolatos, Joe Conway
Discussion: https://postgr.es/m/20190301023338.GD1348@paquier.xyz
Discussion: https://postgr.es/m/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
2019-03-09 00:50:55 +01:00
|
|
|
|
2019-07-06 23:18:46 +02:00
|
|
|
if (CloseTransientFile(srcfd) != 0)
|
Tighten use of OpenTransientFile and CloseTransientFile
This fixes two sets of issues related to the use of transient files in
the backend:
1) OpenTransientFile() has been used in some code paths with read-write
flags while read-only is sufficient, so switch those calls to be
read-only where necessary. These have been reported by Joe Conway.
2) When opening transient files, it is up to the caller to close the
file descriptors opened. In error code paths, CloseTransientFile() gets
called to clean up things before issuing an error. However in normal
exit paths, a lot of callers of CloseTransientFile() never actually
reported errors, which could leave a file descriptor open without
knowing about it. This is an issue I complained about a couple of
times, but never had the courage to write and submit a patch, so here we
go.
Note that one frontend code path is impacted by this commit so as an
error is issued when fetching control file data, making backend and
frontend to be treated consistently.
Reported-by: Joe Conway, Michael Paquier
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Georgios Kokolatos, Joe Conway
Discussion: https://postgr.es/m/20190301023338.GD1348@paquier.xyz
Discussion: https://postgr.es/m/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
2019-03-09 00:50:55 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not close file \"%s\": %m", path)));
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Append one line with the details of this timeline split.
|
|
|
|
*
|
|
|
|
* If we did have a parent file, insert an extra newline just in case the
|
|
|
|
* parent file failed to end with one.
|
|
|
|
*/
|
|
|
|
snprintf(buffer, sizeof(buffer),
|
2012-12-04 14:28:58 +01:00
|
|
|
"%s%u\t%X/%X\t%s\n",
|
2012-10-02 12:37:19 +02:00
|
|
|
(srcfd < 0) ? "" : "\n",
|
|
|
|
parentTLI,
|
2021-02-23 10:14:38 +01:00
|
|
|
LSN_FORMAT_ARGS(switchpoint),
|
2012-10-02 12:37:19 +02:00
|
|
|
reason);
|
|
|
|
|
|
|
|
nbytes = strlen(buffer);
|
|
|
|
errno = 0;
|
2020-05-08 03:36:40 +02:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_WRITE);
|
2012-10-02 12:37:19 +02:00
|
|
|
if ((int) write(fd, buffer, nbytes) != nbytes)
|
|
|
|
{
|
|
|
|
int save_errno = errno;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we fail to make the file, delete it to release disk space
|
|
|
|
*/
|
|
|
|
unlink(tmppath);
|
|
|
|
/* if write didn't set errno, assume problem is no disk space */
|
|
|
|
errno = save_errno ? save_errno : ENOSPC;
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not write to file \"%s\": %m", tmppath)));
|
|
|
|
}
|
2020-05-08 03:36:40 +02:00
|
|
|
pgstat_report_wait_end();
|
2012-10-02 12:37:19 +02:00
|
|
|
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_SYNC);
|
2012-10-02 12:37:19 +02:00
|
|
|
if (pg_fsync(fd) != 0)
|
PANIC on fsync() failure.
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure. In that case the only remaining copy of the
data is in the WAL. A subsequent fsync() could appear to succeed,
but not have flushed the data. That means that a future checkpoint
could apparently complete successfully but have lost data.
Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure. Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.
Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses. If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail. The GUC defaults
to off, meaning that we panic.
Back-patch to all supported releases.
There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error. A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.
Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
2018-11-19 01:31:10 +01:00
|
|
|
ereport(data_sync_elevel(ERROR),
|
2012-10-02 12:37:19 +02:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not fsync file \"%s\": %m", tmppath)));
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2019-07-06 23:18:46 +02:00
|
|
|
if (CloseTransientFile(fd) != 0)
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not close file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now move the completed history file into place with its final name.
|
|
|
|
*/
|
|
|
|
TLHistoryFilePath(path, newTLI);
|
Replace durable_rename_excl() by durable_rename(), take two
durable_rename_excl() attempts to avoid overwriting any existing files
by using link() and unlink(), and it falls back to rename() on some
platforms (aka WIN32), which offers no such overwrite protection. Most
callers use durable_rename_excl() just in case there is an existing
file, but in practice there shouldn't be one (see below for more
details).
Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file. As per Nathan's tests, it is
possible to end up with two links to the same file in pg_wal after a
crash just before unlink() during WAL recycling. Specifically, the test
produced links to the same file for the current WAL file and the next
one because the half-recycled WAL file was re-recycled upon restarting,
leading to WAL corruption.
This change replaces all the calls of durable_rename_excl() to
durable_rename(). This removes the protection against accidentally
overwriting an existing file, but some platforms are already living
without it and ordinarily there shouldn't be one. The function itself
is left around in case any extensions are using it. It will be removed
on HEAD via a follow-up commit.
Here is a summary of the existing callers of durable_rename_excl() (see
second discussion link at the bottom), replaced by this commit. First,
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms. Second, xlog.c uses it to recycle
past WAL segments, where an overwrite should not happen (origin of the
change at f0e37a8) because there are protections about the WAL segment
to select when recycling an entry. The third and last area is related
to the write of timeline history files. writeTimeLineHistory() will
write a new timeline history file at the end of recovery on promotion,
so there should be no such files for the same timeline.
What remains is writeTimeLineHistoryFile(), that can be used in parallel
by a WAL receiver and the startup process, and some digging of the
buildfarm shows that EEXIST from a WAL receiver can happen with an error
of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\",
which would cause an automatic restart of the WAL receiver as it is
promoted to FATAL, hence this should improve the stability of the WAL
receiver as rename() would overwrite an existing TLI history file
already fetched by the startup process at recovery.
This is a bug fix, but knowing the unlikeliness of the problem involving
one or more crashes at an exceptionally bad moment, no backpatch is
done. Also, I want to be careful with such changes (aaa3aed did the
opposite of this change by removing HAVE_WORKING_LINK so as Windows
would do a link() rather than a rename() but this was not
concurrent-safe). A backpatch could be revisited in the future. This
is the second time this change is attempted, ccfbd92 being the first
one, but this time no assertions are added for the case of a TLI history
file written concurrently by the WAL receiver or the startup process
because we can expect one to exist (some of the TAP tests are able to
trigger with a proper timing).
Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz
2022-07-05 03:16:12 +02:00
|
|
|
Assert(access(path, F_OK) != 0 && errno == ENOENT);
|
|
|
|
durable_rename(tmppath, path, ERROR);
|
2012-10-03 08:08:13 +02:00
|
|
|
|
|
|
|
/* The history file can be archived immediately. */
|
2014-11-06 13:24:40 +01:00
|
|
|
if (XLogArchivingActive())
|
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, newTLI);
|
2021-09-04 18:14:30 +02:00
|
|
|
XLogArchiveNotify(histfname);
|
2014-11-06 13:24:40 +01:00
|
|
|
}
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
2012-12-04 14:28:58 +01:00
|
|
|
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
/*
|
|
|
|
* Writes a history file for given timeline and contents.
|
|
|
|
*
|
|
|
|
* Currently this is only used in the walreceiver process, and so there are
|
|
|
|
* no locking considerations. But we should be just as tense as XLogFileInit
|
|
|
|
* to avoid emplacing a bogus file.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
|
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char tmppath[MAXPGPATH];
|
|
|
|
int fd;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Write into a temp file name.
|
|
|
|
*/
|
|
|
|
snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
|
|
|
|
|
|
|
|
unlink(tmppath);
|
|
|
|
|
|
|
|
/* do not use get_sync_bit() here --- want to fsync only at end of fill */
|
2017-09-23 15:49:22 +02:00
|
|
|
fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL);
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
if (fd < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
errno = 0;
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE);
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
if ((int) write(fd, content, size) != size)
|
|
|
|
{
|
|
|
|
int save_errno = errno;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we fail to make the file, delete it to release disk space
|
|
|
|
*/
|
|
|
|
unlink(tmppath);
|
|
|
|
/* if write didn't set errno, assume problem is no disk space */
|
|
|
|
errno = save_errno ? save_errno : ENOSPC;
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not write to file \"%s\": %m", tmppath)));
|
|
|
|
}
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC);
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
if (pg_fsync(fd) != 0)
|
PANIC on fsync() failure.
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure. In that case the only remaining copy of the
data is in the WAL. A subsequent fsync() could appear to succeed,
but not have flushed the data. That means that a future checkpoint
could apparently complete successfully but have lost data.
Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure. Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.
Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses. If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail. The GUC defaults
to off, meaning that we panic.
Back-patch to all supported releases.
There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error. A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.
Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
2018-11-19 01:31:10 +01:00
|
|
|
ereport(data_sync_elevel(ERROR),
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not fsync file \"%s\": %m", tmppath)));
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
|
2019-07-06 23:18:46 +02:00
|
|
|
if (CloseTransientFile(fd) != 0)
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not close file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
/*
|
Replace durable_rename_excl() by durable_rename(), take two
durable_rename_excl() attempts to avoid overwriting any existing files
by using link() and unlink(), and it falls back to rename() on some
platforms (aka WIN32), which offers no such overwrite protection. Most
callers use durable_rename_excl() just in case there is an existing
file, but in practice there shouldn't be one (see below for more
details).
Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file. As per Nathan's tests, it is
possible to end up with two links to the same file in pg_wal after a
crash just before unlink() during WAL recycling. Specifically, the test
produced links to the same file for the current WAL file and the next
one because the half-recycled WAL file was re-recycled upon restarting,
leading to WAL corruption.
This change replaces all the calls of durable_rename_excl() to
durable_rename(). This removes the protection against accidentally
overwriting an existing file, but some platforms are already living
without it and ordinarily there shouldn't be one. The function itself
is left around in case any extensions are using it. It will be removed
on HEAD via a follow-up commit.
Here is a summary of the existing callers of durable_rename_excl() (see
second discussion link at the bottom), replaced by this commit. First,
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms. Second, xlog.c uses it to recycle
past WAL segments, where an overwrite should not happen (origin of the
change at f0e37a8) because there are protections about the WAL segment
to select when recycling an entry. The third and last area is related
to the write of timeline history files. writeTimeLineHistory() will
write a new timeline history file at the end of recovery on promotion,
so there should be no such files for the same timeline.
What remains is writeTimeLineHistoryFile(), that can be used in parallel
by a WAL receiver and the startup process, and some digging of the
buildfarm shows that EEXIST from a WAL receiver can happen with an error
of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\",
which would cause an automatic restart of the WAL receiver as it is
promoted to FATAL, hence this should improve the stability of the WAL
receiver as rename() would overwrite an existing TLI history file
already fetched by the startup process at recovery.
This is a bug fix, but knowing the unlikeliness of the problem involving
one or more crashes at an exceptionally bad moment, no backpatch is
done. Also, I want to be careful with such changes (aaa3aed did the
opposite of this change by removing HAVE_WORKING_LINK so as Windows
would do a link() rather than a rename() but this was not
concurrent-safe). A backpatch could be revisited in the future. This
is the second time this change is attempted, ccfbd92 being the first
one, but this time no assertions are added for the case of a TLI history
file written concurrently by the WAL receiver or the startup process
because we can expect one to exist (some of the TAP tests are able to
trigger with a proper timing).
Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz
2022-07-05 03:16:12 +02:00
|
|
|
* Now move the completed history file into place with its final name,
|
|
|
|
* replacing any existing file with the same name.
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
*/
|
|
|
|
TLHistoryFilePath(path, tli);
|
Replace durable_rename_excl() by durable_rename(), take two
durable_rename_excl() attempts to avoid overwriting any existing files
by using link() and unlink(), and it falls back to rename() on some
platforms (aka WIN32), which offers no such overwrite protection. Most
callers use durable_rename_excl() just in case there is an existing
file, but in practice there shouldn't be one (see below for more
details).
Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file. As per Nathan's tests, it is
possible to end up with two links to the same file in pg_wal after a
crash just before unlink() during WAL recycling. Specifically, the test
produced links to the same file for the current WAL file and the next
one because the half-recycled WAL file was re-recycled upon restarting,
leading to WAL corruption.
This change replaces all the calls of durable_rename_excl() to
durable_rename(). This removes the protection against accidentally
overwriting an existing file, but some platforms are already living
without it and ordinarily there shouldn't be one. The function itself
is left around in case any extensions are using it. It will be removed
on HEAD via a follow-up commit.
Here is a summary of the existing callers of durable_rename_excl() (see
second discussion link at the bottom), replaced by this commit. First,
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms. Second, xlog.c uses it to recycle
past WAL segments, where an overwrite should not happen (origin of the
change at f0e37a8) because there are protections about the WAL segment
to select when recycling an entry. The third and last area is related
to the write of timeline history files. writeTimeLineHistory() will
write a new timeline history file at the end of recovery on promotion,
so there should be no such files for the same timeline.
What remains is writeTimeLineHistoryFile(), that can be used in parallel
by a WAL receiver and the startup process, and some digging of the
buildfarm shows that EEXIST from a WAL receiver can happen with an error
of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\",
which would cause an automatic restart of the WAL receiver as it is
promoted to FATAL, hence this should improve the stability of the WAL
receiver as rename() would overwrite an existing TLI history file
already fetched by the startup process at recovery.
This is a bug fix, but knowing the unlikeliness of the problem involving
one or more crashes at an exceptionally bad moment, no backpatch is
done. Also, I want to be careful with such changes (aaa3aed did the
opposite of this change by removing HAVE_WORKING_LINK so as Windows
would do a link() rather than a rename() but this was not
concurrent-safe). A backpatch could be revisited in the future. This
is the second time this change is attempted, ccfbd92 being the first
one, but this time no assertions are added for the case of a TLI history
file written concurrently by the WAL receiver or the startup process
because we can expect one to exist (some of the TAP tests are able to
trigger with a proper timing).
Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz
2022-07-05 03:16:12 +02:00
|
|
|
durable_rename(tmppath, path, ERROR);
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
}
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
/*
|
|
|
|
* Returns true if 'expectedTLEs' contains a timeline with id 'tli'
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
tliInHistory(TimeLineID tli, List *expectedTLEs)
|
|
|
|
{
|
|
|
|
ListCell *cell;
|
|
|
|
|
|
|
|
foreach(cell, expectedTLEs)
|
|
|
|
{
|
|
|
|
if (((TimeLineHistoryEntry *) lfirst(cell))->tli == tli)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns the ID of the timeline in use at a particular point in time, in
|
|
|
|
* the given timeline history.
|
|
|
|
*/
|
|
|
|
TimeLineID
|
|
|
|
tliOfPointInHistory(XLogRecPtr ptr, List *history)
|
|
|
|
{
|
|
|
|
ListCell *cell;
|
|
|
|
|
|
|
|
foreach(cell, history)
|
|
|
|
{
|
|
|
|
TimeLineHistoryEntry *tle = (TimeLineHistoryEntry *) lfirst(cell);
|
2013-05-29 22:58:43 +02:00
|
|
|
|
2012-12-28 17:06:15 +01:00
|
|
|
if ((XLogRecPtrIsInvalid(tle->begin) || tle->begin <= ptr) &&
|
|
|
|
(XLogRecPtrIsInvalid(tle->end) || ptr < tle->end))
|
2012-12-04 14:28:58 +01:00
|
|
|
{
|
|
|
|
/* found it */
|
|
|
|
return tle->tli;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* shouldn't happen. */
|
|
|
|
elog(ERROR, "timeline history was not contiguous");
|
|
|
|
return 0; /* keep compiler quiet */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
* Returns the point in history where we branched off the given timeline,
|
|
|
|
* and the timeline we branched to (*nextTLI). Returns InvalidXLogRecPtr if
|
|
|
|
* the timeline is current, ie. we have not branched off from it, and throws
|
|
|
|
* an error if the timeline is not part of this server's history.
|
2012-12-04 14:28:58 +01:00
|
|
|
*/
|
|
|
|
XLogRecPtr
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
tliSwitchPoint(TimeLineID tli, List *history, TimeLineID *nextTLI)
|
2012-12-04 14:28:58 +01:00
|
|
|
{
|
|
|
|
ListCell *cell;
|
|
|
|
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
if (nextTLI)
|
|
|
|
*nextTLI = 0;
|
2012-12-04 14:28:58 +01:00
|
|
|
foreach(cell, history)
|
|
|
|
{
|
|
|
|
TimeLineHistoryEntry *tle = (TimeLineHistoryEntry *) lfirst(cell);
|
|
|
|
|
|
|
|
if (tle->tli == tli)
|
|
|
|
return tle->end;
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
if (nextTLI)
|
|
|
|
*nextTLI = tle->tli;
|
2012-12-04 14:28:58 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errmsg("requested timeline %u is not in this server's history",
|
|
|
|
tli)));
|
|
|
|
return InvalidXLogRecPtr; /* keep compiler quiet */
|
|
|
|
}
|