2012-10-02 12:37:19 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* timeline.c
|
|
|
|
* Functions for reading and writing timeline history files.
|
|
|
|
*
|
|
|
|
* A timeline history file lists the timeline changes of the timeline, in
|
|
|
|
* a simple text format. They are archived along with the WAL segments.
|
|
|
|
*
|
2012-10-03 08:08:13 +02:00
|
|
|
* The files are named like "<tli>.history". For example, if the database
|
|
|
|
* starts up and switches to timeline 5, the timeline history file would be
|
|
|
|
* called "00000005.history".
|
2012-10-02 12:37:19 +02:00
|
|
|
*
|
|
|
|
* Each line in the file represents a timeline switch:
|
|
|
|
*
|
2012-12-04 14:28:58 +01:00
|
|
|
* <parentTLI> <switchpoint> <reason>
|
2012-10-02 12:37:19 +02:00
|
|
|
*
|
|
|
|
* parentTLI ID of the parent timeline
|
2013-05-29 22:58:43 +02:00
|
|
|
* switchpoint XLogRecPtr of the WAL position where the switch happened
|
2012-10-02 12:37:19 +02:00
|
|
|
* reason human-readable explanation of why the timeline was changed
|
|
|
|
*
|
|
|
|
* The fields are separated by tabs. Lines beginning with # are comments, and
|
|
|
|
* are ignored. Empty lines are also ignored.
|
|
|
|
*
|
2017-01-03 19:48:53 +01:00
|
|
|
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
2012-10-02 12:37:19 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* src/backend/access/transam/timeline.c
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2012-10-02 16:19:52 +02:00
|
|
|
#include <sys/stat.h>
|
2012-10-02 12:37:19 +02:00
|
|
|
#include <unistd.h>
|
|
|
|
|
|
|
|
#include "access/timeline.h"
|
2014-11-06 13:24:40 +01:00
|
|
|
#include "access/xlog.h"
|
2012-10-02 12:37:19 +02:00
|
|
|
#include "access/xlog_internal.h"
|
|
|
|
#include "access/xlogdefs.h"
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
#include "pgstat.h"
|
2012-10-02 12:37:19 +02:00
|
|
|
#include "storage/fd.h"
|
|
|
|
|
Fix more issues with cascading replication and timeline switches.
When a standby server follows the master using WAL archive, and it chooses
a new timeline (recovery_target_timeline='latest'), it only fetches the
timeline history file for the chosen target timeline, not any other history
files that might be missing from pg_xlog. For example, if the current
timeline is 2, and we choose 4 as the new recovery target timeline, the
history file for timeline 3 is not fetched, even if it's part of this
server's history. That's enough for the standby itself - the history file
for timeline 4 includes timeline 3 as well - but if a cascading standby
server wants to recover to timeline 3, it needs the history file. To fix,
when a new recovery target timeline is chosen, try to copy any missing
history files from the archive to pg_xlog between the old and new target
timeline.
A second similar issue was with the WAL files. When a standby recovers from
archive, and it reaches a segment that contains a switch to a new timeline,
recovery fetches only the WAL file labelled with the new timeline's ID. The
file from the new timeline contains a copy of the WAL from the old timeline
up to the point where the switch happened, and recovery recovers it from the
new file. But in streaming replication, walsender only tries to read it
from the old timeline's file. To fix, change walsender to read it from the
new file, so that it behaves the same as recovery in that sense, and doesn't
try to open the possibly nonexistent file with the old timeline's ID.
2013-01-23 09:01:04 +01:00
|
|
|
/*
|
|
|
|
* Copies all timeline history files with id's between 'begin' and 'end'
|
2016-10-20 17:24:37 +02:00
|
|
|
* from archive to pg_wal.
|
Fix more issues with cascading replication and timeline switches.
When a standby server follows the master using WAL archive, and it chooses
a new timeline (recovery_target_timeline='latest'), it only fetches the
timeline history file for the chosen target timeline, not any other history
files that might be missing from pg_xlog. For example, if the current
timeline is 2, and we choose 4 as the new recovery target timeline, the
history file for timeline 3 is not fetched, even if it's part of this
server's history. That's enough for the standby itself - the history file
for timeline 4 includes timeline 3 as well - but if a cascading standby
server wants to recover to timeline 3, it needs the history file. To fix,
when a new recovery target timeline is chosen, try to copy any missing
history files from the archive to pg_xlog between the old and new target
timeline.
A second similar issue was with the WAL files. When a standby recovers from
archive, and it reaches a segment that contains a switch to a new timeline,
recovery fetches only the WAL file labelled with the new timeline's ID. The
file from the new timeline contains a copy of the WAL from the old timeline
up to the point where the switch happened, and recovery recovers it from the
new file. But in streaming replication, walsender only tries to read it
from the old timeline's file. To fix, change walsender to read it from the
new file, so that it behaves the same as recovery in that sense, and doesn't
try to open the possibly nonexistent file with the old timeline's ID.
2013-01-23 09:01:04 +01:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
restoreTimeLineHistoryFiles(TimeLineID begin, TimeLineID end)
|
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
2013-05-29 22:58:43 +02:00
|
|
|
TimeLineID tli;
|
Fix more issues with cascading replication and timeline switches.
When a standby server follows the master using WAL archive, and it chooses
a new timeline (recovery_target_timeline='latest'), it only fetches the
timeline history file for the chosen target timeline, not any other history
files that might be missing from pg_xlog. For example, if the current
timeline is 2, and we choose 4 as the new recovery target timeline, the
history file for timeline 3 is not fetched, even if it's part of this
server's history. That's enough for the standby itself - the history file
for timeline 4 includes timeline 3 as well - but if a cascading standby
server wants to recover to timeline 3, it needs the history file. To fix,
when a new recovery target timeline is chosen, try to copy any missing
history files from the archive to pg_xlog between the old and new target
timeline.
A second similar issue was with the WAL files. When a standby recovers from
archive, and it reaches a segment that contains a switch to a new timeline,
recovery fetches only the WAL file labelled with the new timeline's ID. The
file from the new timeline contains a copy of the WAL from the old timeline
up to the point where the switch happened, and recovery recovers it from the
new file. But in streaming replication, walsender only tries to read it
from the old timeline's file. To fix, change walsender to read it from the
new file, so that it behaves the same as recovery in that sense, and doesn't
try to open the possibly nonexistent file with the old timeline's ID.
2013-01-23 09:01:04 +01:00
|
|
|
|
|
|
|
for (tli = begin; tli < end; tli++)
|
|
|
|
{
|
|
|
|
if (tli == 1)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
TLHistoryFileName(histfname, tli);
|
|
|
|
if (RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false))
|
|
|
|
KeepFileRestoredFromArchive(path, histfname);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-10-02 12:37:19 +02:00
|
|
|
/*
|
|
|
|
* Try to read a timeline's history file.
|
|
|
|
*
|
|
|
|
* If successful, return the list of component TLIs (the given TLI followed by
|
2014-05-06 18:12:18 +02:00
|
|
|
* its ancestor TLIs). If we can't find the history file, assume that the
|
2012-10-02 12:37:19 +02:00
|
|
|
* timeline has no parents, and return a list of just the specified timeline
|
|
|
|
* ID.
|
|
|
|
*/
|
|
|
|
List *
|
|
|
|
readTimeLineHistory(TimeLineID targetTLI)
|
|
|
|
{
|
|
|
|
List *result;
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
|
|
|
char fline[MAXPGPATH];
|
|
|
|
FILE *fd;
|
2012-12-04 14:28:58 +01:00
|
|
|
TimeLineHistoryEntry *entry;
|
|
|
|
TimeLineID lasttli = 0;
|
|
|
|
XLogRecPtr prevend;
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
bool fromArchive = false;
|
2012-10-02 12:37:19 +02:00
|
|
|
|
|
|
|
/* Timeline 1 does not have a history file, so no need to check */
|
|
|
|
if (targetTLI == 1)
|
2012-12-04 14:28:58 +01:00
|
|
|
{
|
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = targetTLI;
|
|
|
|
entry->begin = entry->end = InvalidXLogRecPtr;
|
|
|
|
return list_make1(entry);
|
|
|
|
}
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2013-03-07 11:18:41 +01:00
|
|
|
if (ArchiveRecoveryRequested)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, targetTLI);
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
fromArchive =
|
|
|
|
RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
TLHistoryFilePath(path, targetTLI);
|
|
|
|
|
|
|
|
fd = AllocateFile(path, "r");
|
|
|
|
if (fd == NULL)
|
|
|
|
{
|
|
|
|
if (errno != ENOENT)
|
|
|
|
ereport(FATAL,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open file \"%s\": %m", path)));
|
|
|
|
/* Not there, so assume no parents */
|
2012-12-04 14:28:58 +01:00
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = targetTLI;
|
|
|
|
entry->begin = entry->end = InvalidXLogRecPtr;
|
|
|
|
return list_make1(entry);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
result = NIL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Parse the file...
|
|
|
|
*/
|
2012-12-04 14:28:58 +01:00
|
|
|
prevend = InvalidXLogRecPtr;
|
2012-10-02 12:37:19 +02:00
|
|
|
while (fgets(fline, sizeof(fline), fd) != NULL)
|
|
|
|
{
|
|
|
|
/* skip leading whitespace and check for # comment */
|
|
|
|
char *ptr;
|
|
|
|
TimeLineID tli;
|
2012-12-04 14:28:58 +01:00
|
|
|
uint32 switchpoint_hi;
|
|
|
|
uint32 switchpoint_lo;
|
|
|
|
int nfields;
|
2012-10-02 12:37:19 +02:00
|
|
|
|
|
|
|
for (ptr = fline; *ptr; ptr++)
|
|
|
|
{
|
|
|
|
if (!isspace((unsigned char) *ptr))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (*ptr == '\0' || *ptr == '#')
|
|
|
|
continue;
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
nfields = sscanf(fline, "%u\t%X/%X", &tli, &switchpoint_hi, &switchpoint_lo);
|
|
|
|
|
|
|
|
if (nfields < 1)
|
|
|
|
{
|
|
|
|
/* expect a numeric timeline ID as first field of line */
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("syntax error in history file: %s", fline),
|
|
|
|
errhint("Expected a numeric timeline ID.")));
|
2012-12-04 14:28:58 +01:00
|
|
|
}
|
|
|
|
if (nfields != 3)
|
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("syntax error in history file: %s", fline),
|
2014-05-06 18:12:18 +02:00
|
|
|
errhint("Expected a transaction log switchpoint location.")));
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
if (result && tli <= lasttli)
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("invalid data in history file: %s", fline),
|
|
|
|
errhint("Timeline IDs must be in increasing sequence.")));
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
lasttli = tli;
|
|
|
|
|
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = tli;
|
|
|
|
entry->begin = prevend;
|
|
|
|
entry->end = ((uint64) (switchpoint_hi)) << 32 | (uint64) switchpoint_lo;
|
|
|
|
prevend = entry->end;
|
|
|
|
|
2012-10-02 12:37:19 +02:00
|
|
|
/* Build list with newest item first */
|
2012-12-04 14:28:58 +01:00
|
|
|
result = lcons(entry, result);
|
2012-10-02 12:37:19 +02:00
|
|
|
|
|
|
|
/* we ignore the remainder of each line */
|
|
|
|
}
|
|
|
|
|
|
|
|
FreeFile(fd);
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
if (result && targetTLI <= lasttli)
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(FATAL,
|
|
|
|
(errmsg("invalid data in history file \"%s\"", path),
|
|
|
|
errhint("Timeline IDs must be less than child timeline's ID.")));
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
/*
|
2013-05-29 22:58:43 +02:00
|
|
|
* Create one more entry for the "tip" of the timeline, which has no entry
|
|
|
|
* in the history file.
|
2012-12-04 14:28:58 +01:00
|
|
|
*/
|
|
|
|
entry = (TimeLineHistoryEntry *) palloc(sizeof(TimeLineHistoryEntry));
|
|
|
|
entry->tli = targetTLI;
|
|
|
|
entry->begin = prevend;
|
|
|
|
entry->end = InvalidXLogRecPtr;
|
2012-10-02 12:37:19 +02:00
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
result = lcons(entry, result);
|
2012-10-02 12:37:19 +02:00
|
|
|
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
/*
|
2016-10-20 17:24:37 +02:00
|
|
|
* If the history file was fetched from archive, save it in pg_wal for
|
Keep timeline history files restored from archive in pg_xlog.
The cascading standby patch in 9.2 changed the way WAL files are treated
when restored from the archive. Before, they were restored under a temporary
filename, and not kept in pg_xlog, but after the patch, they were copied
under pg_xlog. This is necessary for a cascading standby to find them, but
it also means that if the archive goes offline and a standby is restarted,
it can recover back to where it was using the files in pg_xlog. It also
means that if you take an offline backup from a standby server, it includes
all the required WAL files in pg_xlog.
However, the same change was not made to timeline history files, so if the
WAL segment containing the checkpoint record contains a timeline switch, you
will still get an error if you try to restart recovery without the archive,
or recover from an offline backup taken from the standby.
With this patch, timeline history files restored from archive are copied
into pg_xlog like WAL files are, so that pg_xlog contains all the files
required to recover. This is a corner-case pre-existing issue in 9.2, but
even more important in master where it's possible for a standby to follow a
timeline switch through streaming replication. To make that possible, the
timeline history files must be present in pg_xlog.
2012-12-30 13:26:47 +01:00
|
|
|
* future reference.
|
|
|
|
*/
|
|
|
|
if (fromArchive)
|
|
|
|
KeepFileRestoredFromArchive(path, histfname);
|
|
|
|
|
2012-10-02 12:37:19 +02:00
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Probe whether a timeline history file exists for the given timeline ID
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
existsTimeLineHistory(TimeLineID probeTLI)
|
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
|
|
|
FILE *fd;
|
|
|
|
|
|
|
|
/* Timeline 1 does not have a history file, so no need to check */
|
|
|
|
if (probeTLI == 1)
|
|
|
|
return false;
|
|
|
|
|
2013-03-07 11:18:41 +01:00
|
|
|
if (ArchiveRecoveryRequested)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, probeTLI);
|
2012-11-19 09:02:25 +01:00
|
|
|
RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
TLHistoryFilePath(path, probeTLI);
|
|
|
|
|
|
|
|
fd = AllocateFile(path, "r");
|
|
|
|
if (fd != NULL)
|
|
|
|
{
|
|
|
|
FreeFile(fd);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
if (errno != ENOENT)
|
|
|
|
ereport(FATAL,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open file \"%s\": %m", path)));
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the newest existing timeline, assuming that startTLI exists.
|
|
|
|
*
|
|
|
|
* Note: while this is somewhat heuristic, it does positively guarantee
|
|
|
|
* that (result + 1) is not a known timeline, and therefore it should
|
|
|
|
* be safe to assign that ID to a new timeline.
|
|
|
|
*/
|
|
|
|
TimeLineID
|
|
|
|
findNewestTimeLine(TimeLineID startTLI)
|
|
|
|
{
|
|
|
|
TimeLineID newestTLI;
|
|
|
|
TimeLineID probeTLI;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The algorithm is just to probe for the existence of timeline history
|
|
|
|
* files. XXX is it useful to allow gaps in the sequence?
|
|
|
|
*/
|
|
|
|
newestTLI = startTLI;
|
|
|
|
|
|
|
|
for (probeTLI = startTLI + 1;; probeTLI++)
|
|
|
|
{
|
|
|
|
if (existsTimeLineHistory(probeTLI))
|
|
|
|
{
|
|
|
|
newestTLI = probeTLI; /* probeTLI exists */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* doesn't exist, assume we're done */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return newestTLI;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a new timeline history file.
|
|
|
|
*
|
|
|
|
* newTLI: ID of the new timeline
|
|
|
|
* parentTLI: ID of its immediate parent
|
2012-12-04 14:28:58 +01:00
|
|
|
* switchpoint: XLOG position where the system switched to the new timeline
|
2012-10-02 12:37:19 +02:00
|
|
|
* reason: human-readable explanation of why the timeline was switched
|
|
|
|
*
|
|
|
|
* Currently this is only used at the end recovery, and so there are no locking
|
2014-05-06 18:12:18 +02:00
|
|
|
* considerations. But we should be just as tense as XLogFileInit to avoid
|
2012-10-02 12:37:19 +02:00
|
|
|
* emplacing a bogus file.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
|
2012-12-04 14:28:58 +01:00
|
|
|
XLogRecPtr switchpoint, char *reason)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char tmppath[MAXPGPATH];
|
|
|
|
char histfname[MAXFNAMELEN];
|
|
|
|
char buffer[BLCKSZ];
|
|
|
|
int srcfd;
|
|
|
|
int fd;
|
|
|
|
int nbytes;
|
|
|
|
|
|
|
|
Assert(newTLI > parentTLI); /* else bad selection of newTLI */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Write into a temp file name.
|
|
|
|
*/
|
|
|
|
snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
|
|
|
|
|
|
|
|
unlink(tmppath);
|
|
|
|
|
|
|
|
/* do not use get_sync_bit() here --- want to fsync only at end of fill */
|
Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 09:25:50 +01:00
|
|
|
fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL,
|
|
|
|
S_IRUSR | S_IWUSR);
|
2012-10-02 12:37:19 +02:00
|
|
|
if (fd < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If a history file exists for the parent, copy it verbatim
|
|
|
|
*/
|
2013-03-07 11:18:41 +01:00
|
|
|
if (ArchiveRecoveryRequested)
|
2012-10-02 12:37:19 +02:00
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, parentTLI);
|
2012-11-19 09:02:25 +01:00
|
|
|
RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
TLHistoryFilePath(path, parentTLI);
|
|
|
|
|
Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 09:25:50 +01:00
|
|
|
srcfd = OpenTransientFile(path, O_RDONLY, 0);
|
2012-10-02 12:37:19 +02:00
|
|
|
if (srcfd < 0)
|
|
|
|
{
|
|
|
|
if (errno != ENOENT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open file \"%s\": %m", path)));
|
|
|
|
/* Not there, so assume parent has no parents */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
errno = 0;
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_READ);
|
2012-10-02 12:37:19 +02:00
|
|
|
nbytes = (int) read(srcfd, buffer, sizeof(buffer));
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2012-10-02 12:37:19 +02:00
|
|
|
if (nbytes < 0 || errno != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not read file \"%s\": %m", path)));
|
|
|
|
if (nbytes == 0)
|
|
|
|
break;
|
|
|
|
errno = 0;
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_WRITE);
|
2012-10-02 12:37:19 +02:00
|
|
|
if ((int) write(fd, buffer, nbytes) != nbytes)
|
|
|
|
{
|
|
|
|
int save_errno = errno;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we fail to make the file, delete it to release disk
|
|
|
|
* space
|
|
|
|
*/
|
|
|
|
unlink(tmppath);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if write didn't set errno, assume problem is no disk space
|
|
|
|
*/
|
|
|
|
errno = save_errno ? save_errno : ENOSPC;
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not write to file \"%s\": %m", tmppath)));
|
|
|
|
}
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 09:25:50 +01:00
|
|
|
CloseTransientFile(srcfd);
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Append one line with the details of this timeline split.
|
|
|
|
*
|
|
|
|
* If we did have a parent file, insert an extra newline just in case the
|
|
|
|
* parent file failed to end with one.
|
|
|
|
*/
|
|
|
|
snprintf(buffer, sizeof(buffer),
|
2012-12-04 14:28:58 +01:00
|
|
|
"%s%u\t%X/%X\t%s\n",
|
2012-10-02 12:37:19 +02:00
|
|
|
(srcfd < 0) ? "" : "\n",
|
|
|
|
parentTLI,
|
2012-12-04 14:28:58 +01:00
|
|
|
(uint32) (switchpoint >> 32), (uint32) (switchpoint),
|
2012-10-02 12:37:19 +02:00
|
|
|
reason);
|
|
|
|
|
|
|
|
nbytes = strlen(buffer);
|
|
|
|
errno = 0;
|
|
|
|
if ((int) write(fd, buffer, nbytes) != nbytes)
|
|
|
|
{
|
|
|
|
int save_errno = errno;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we fail to make the file, delete it to release disk space
|
|
|
|
*/
|
|
|
|
unlink(tmppath);
|
|
|
|
/* if write didn't set errno, assume problem is no disk space */
|
|
|
|
errno = save_errno ? save_errno : ENOSPC;
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not write to file \"%s\": %m", tmppath)));
|
|
|
|
}
|
|
|
|
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_SYNC);
|
2012-10-02 12:37:19 +02:00
|
|
|
if (pg_fsync(fd) != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not fsync file \"%s\": %m", tmppath)));
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2012-10-02 12:37:19 +02:00
|
|
|
|
Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 09:25:50 +01:00
|
|
|
if (CloseTransientFile(fd))
|
2012-10-02 12:37:19 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not close file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now move the completed history file into place with its final name.
|
|
|
|
*/
|
|
|
|
TLHistoryFilePath(path, newTLI);
|
|
|
|
|
|
|
|
/*
|
2016-03-10 03:53:53 +01:00
|
|
|
* Perform the rename using link if available, paranoidly trying to avoid
|
|
|
|
* overwriting an existing file (there shouldn't be one).
|
2012-10-02 12:37:19 +02:00
|
|
|
*/
|
2016-03-10 03:53:53 +01:00
|
|
|
durable_link_or_rename(tmppath, path, ERROR);
|
2012-10-03 08:08:13 +02:00
|
|
|
|
|
|
|
/* The history file can be archived immediately. */
|
2014-11-06 13:24:40 +01:00
|
|
|
if (XLogArchivingActive())
|
|
|
|
{
|
|
|
|
TLHistoryFileName(histfname, newTLI);
|
|
|
|
XLogArchiveNotify(histfname);
|
|
|
|
}
|
2012-10-02 12:37:19 +02:00
|
|
|
}
|
2012-12-04 14:28:58 +01:00
|
|
|
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
/*
|
|
|
|
* Writes a history file for given timeline and contents.
|
|
|
|
*
|
|
|
|
* Currently this is only used in the walreceiver process, and so there are
|
|
|
|
* no locking considerations. But we should be just as tense as XLogFileInit
|
|
|
|
* to avoid emplacing a bogus file.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
|
|
|
|
{
|
|
|
|
char path[MAXPGPATH];
|
|
|
|
char tmppath[MAXPGPATH];
|
|
|
|
int fd;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Write into a temp file name.
|
|
|
|
*/
|
|
|
|
snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
|
|
|
|
|
|
|
|
unlink(tmppath);
|
|
|
|
|
|
|
|
/* do not use get_sync_bit() here --- want to fsync only at end of fill */
|
|
|
|
fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL,
|
|
|
|
S_IRUSR | S_IWUSR);
|
|
|
|
if (fd < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
errno = 0;
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_FILE_WRITE);
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
if ((int) write(fd, content, size) != size)
|
|
|
|
{
|
|
|
|
int save_errno = errno;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we fail to make the file, delete it to release disk space
|
|
|
|
*/
|
|
|
|
unlink(tmppath);
|
|
|
|
/* if write didn't set errno, assume problem is no disk space */
|
|
|
|
errno = save_errno ? save_errno : ENOSPC;
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not write to file \"%s\": %m", tmppath)));
|
|
|
|
}
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_TIMELINE_HISTORY_FILE_SYNC);
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
if (pg_fsync(fd) != 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not fsync file \"%s\": %m", tmppath)));
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
|
|
|
|
if (CloseTransientFile(fd))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not close file \"%s\": %m", tmppath)));
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now move the completed history file into place with its final name.
|
|
|
|
*/
|
|
|
|
TLHistoryFilePath(path, tli);
|
|
|
|
|
|
|
|
/*
|
2016-03-10 03:53:53 +01:00
|
|
|
* Perform the rename using link if available, paranoidly trying to avoid
|
|
|
|
* overwriting an existing file (there shouldn't be one).
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
*/
|
2016-03-10 03:53:53 +01:00
|
|
|
durable_link_or_rename(tmppath, path, ERROR);
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 18:00:00 +01:00
|
|
|
}
|
|
|
|
|
2012-12-04 14:28:58 +01:00
|
|
|
/*
|
|
|
|
* Returns true if 'expectedTLEs' contains a timeline with id 'tli'
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
tliInHistory(TimeLineID tli, List *expectedTLEs)
|
|
|
|
{
|
2013-05-29 22:58:43 +02:00
|
|
|
ListCell *cell;
|
2012-12-04 14:28:58 +01:00
|
|
|
|
|
|
|
foreach(cell, expectedTLEs)
|
|
|
|
{
|
|
|
|
if (((TimeLineHistoryEntry *) lfirst(cell))->tli == tli)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns the ID of the timeline in use at a particular point in time, in
|
|
|
|
* the given timeline history.
|
|
|
|
*/
|
|
|
|
TimeLineID
|
|
|
|
tliOfPointInHistory(XLogRecPtr ptr, List *history)
|
|
|
|
{
|
2013-05-29 22:58:43 +02:00
|
|
|
ListCell *cell;
|
2012-12-04 14:28:58 +01:00
|
|
|
|
|
|
|
foreach(cell, history)
|
|
|
|
{
|
|
|
|
TimeLineHistoryEntry *tle = (TimeLineHistoryEntry *) lfirst(cell);
|
2013-05-29 22:58:43 +02:00
|
|
|
|
2012-12-28 17:06:15 +01:00
|
|
|
if ((XLogRecPtrIsInvalid(tle->begin) || tle->begin <= ptr) &&
|
|
|
|
(XLogRecPtrIsInvalid(tle->end) || ptr < tle->end))
|
2012-12-04 14:28:58 +01:00
|
|
|
{
|
|
|
|
/* found it */
|
|
|
|
return tle->tli;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* shouldn't happen. */
|
|
|
|
elog(ERROR, "timeline history was not contiguous");
|
2013-05-29 22:58:43 +02:00
|
|
|
return 0; /* keep compiler quiet */
|
2012-12-04 14:28:58 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
* Returns the point in history where we branched off the given timeline,
|
|
|
|
* and the timeline we branched to (*nextTLI). Returns InvalidXLogRecPtr if
|
|
|
|
* the timeline is current, ie. we have not branched off from it, and throws
|
|
|
|
* an error if the timeline is not part of this server's history.
|
2012-12-04 14:28:58 +01:00
|
|
|
*/
|
|
|
|
XLogRecPtr
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
tliSwitchPoint(TimeLineID tli, List *history, TimeLineID *nextTLI)
|
2012-12-04 14:28:58 +01:00
|
|
|
{
|
|
|
|
ListCell *cell;
|
|
|
|
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
if (nextTLI)
|
|
|
|
*nextTLI = 0;
|
2013-05-29 22:58:43 +02:00
|
|
|
foreach(cell, history)
|
2012-12-04 14:28:58 +01:00
|
|
|
{
|
|
|
|
TimeLineHistoryEntry *tle = (TimeLineHistoryEntry *) lfirst(cell);
|
|
|
|
|
|
|
|
if (tle->tli == tli)
|
|
|
|
return tle->end;
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
if (nextTLI)
|
|
|
|
*nextTLI = tle->tli;
|
2012-12-04 14:28:58 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errmsg("requested timeline %u is not in this server's history",
|
|
|
|
tli)));
|
2013-05-29 22:58:43 +02:00
|
|
|
return InvalidXLogRecPtr; /* keep compiler quiet */
|
2012-12-04 14:28:58 +01:00
|
|
|
}
|