2011-01-23 12:21:23 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* pg_basebackup.c - receive a base backup using streaming replication protocol
|
|
|
|
*
|
|
|
|
* Author: Magnus Hagander <magnus@hagander.net>
|
|
|
|
*
|
2024-01-04 02:49:05 +01:00
|
|
|
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
|
2011-01-23 12:21:23 +01:00
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
|
|
|
* src/bin/pg_basebackup/pg_basebackup.c
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
2012-12-13 13:59:13 +01:00
|
|
|
#include "postgres_fe.h"
|
2011-01-23 12:21:23 +01:00
|
|
|
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <dirent.h>
|
2021-07-24 12:05:14 +02:00
|
|
|
#include <limits.h>
|
2022-08-13 13:33:34 +02:00
|
|
|
#include <sys/select.h>
|
2011-01-23 12:21:23 +01:00
|
|
|
#include <sys/stat.h>
|
2011-10-26 20:13:33 +02:00
|
|
|
#include <sys/wait.h>
|
2014-02-12 20:04:13 +01:00
|
|
|
#include <signal.h>
|
2013-01-05 16:54:06 +01:00
|
|
|
#include <time.h>
|
2011-01-23 12:21:23 +01:00
|
|
|
#ifdef HAVE_LIBZ
|
|
|
|
#include <zlib.h>
|
|
|
|
#endif
|
|
|
|
|
Make WAL segment size configurable at initdb time.
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-20 07:03:48 +02:00
|
|
|
#include "access/xlog_internal.h"
|
2022-08-10 20:03:23 +02:00
|
|
|
#include "backup/basebackup.h"
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
#include "bbstreamer.h"
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
#include "common/compression.h"
|
2018-04-07 23:45:39 +02:00
|
|
|
#include "common/file_perm.h"
|
2016-09-29 18:00:00 +02:00
|
|
|
#include "common/file_utils.h"
|
2019-05-14 20:19:49 +02:00
|
|
|
#include "common/logging.h"
|
2021-07-24 11:35:03 +02:00
|
|
|
#include "fe_utils/option_utils.h"
|
2019-09-25 19:35:24 +02:00
|
|
|
#include "fe_utils/recovery_gen.h"
|
2011-01-23 12:21:23 +01:00
|
|
|
#include "getopt_long.h"
|
2011-10-26 20:13:33 +02:00
|
|
|
#include "receivelog.h"
|
|
|
|
#include "streamutil.h"
|
|
|
|
|
2018-04-03 13:47:16 +02:00
|
|
|
#define ERRCODE_DATA_CORRUPTED "XX001"
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2014-02-22 19:38:06 +01:00
|
|
|
typedef struct TablespaceListCell
|
|
|
|
{
|
|
|
|
struct TablespaceListCell *next;
|
|
|
|
char old_dir[MAXPGPATH];
|
|
|
|
char new_dir[MAXPGPATH];
|
|
|
|
} TablespaceListCell;
|
|
|
|
|
|
|
|
typedef struct TablespaceList
|
|
|
|
{
|
|
|
|
TablespaceListCell *head;
|
|
|
|
TablespaceListCell *tail;
|
|
|
|
} TablespaceList;
|
|
|
|
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
typedef struct ArchiveStreamState
|
|
|
|
{
|
|
|
|
int tablespacenum;
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
pg_compress_specification *compress;
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
bbstreamer *streamer;
|
|
|
|
bbstreamer *manifest_inject_streamer;
|
|
|
|
PQExpBuffer manifest_buffer;
|
|
|
|
char manifest_filename[MAXPGPATH];
|
|
|
|
FILE *manifest_file;
|
|
|
|
} ArchiveStreamState;
|
|
|
|
|
2019-12-05 21:14:09 +01:00
|
|
|
typedef struct WriteTarState
|
|
|
|
{
|
|
|
|
int tablespacenum;
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
bbstreamer *streamer;
|
2019-12-05 21:14:09 +01:00
|
|
|
} WriteTarState;
|
|
|
|
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
typedef struct WriteManifestState
|
|
|
|
{
|
|
|
|
char filename[MAXPGPATH];
|
|
|
|
FILE *file;
|
|
|
|
} WriteManifestState;
|
|
|
|
|
2019-12-05 21:14:09 +01:00
|
|
|
typedef void (*WriteDataCallback) (size_t nbytes, char *buf,
|
|
|
|
void *callback_data);
|
|
|
|
|
2016-10-20 17:24:37 +02:00
|
|
|
/*
|
|
|
|
* pg_xlog has been renamed to pg_wal in version 10. This version number
|
|
|
|
* should be compared with PQserverVersion().
|
|
|
|
*/
|
|
|
|
#define MINIMUM_VERSION_FOR_PG_WAL 100000
|
|
|
|
|
2017-01-16 13:56:43 +01:00
|
|
|
/*
|
|
|
|
* Temporary replication slots are supported from version 10.
|
|
|
|
*/
|
|
|
|
#define MINIMUM_VERSION_FOR_TEMP_SLOTS 100000
|
|
|
|
|
Disable silently generation of manifests with servers <= 12 in pg_basebackup
Since 0d8c9c1, pg_basebackup would generate an error if connected to a
backend version older than 12 where backup manifests are not supported.
Avoiding this error is possible by using the --no-manifest option.
This error handling could be confusing for some users, where patching a
backup script that interacts with multiple backend versions would cause
the addition of --no-manifest to potentially not generate a backup
manifest even for Postgres 13 and newer versions. As we want to
encourage the use of backup manifests as much as possible, this commit
silently disables manifests where not supported, instead of generating
an error.
While on it, rework a bit the code to make it more consistent with the
surroundings when generating the BASE_BACKUP command.
Per discussion with Andres Freund, Stephen Frost, Robert Haas, Álvaro
Herrera, Kyotaro Horiguchi, Tom Lane, David Steele, and me.
Author: Michael Paquier
Discussion: https://postgr.es/m/20200410080910.GZ1606@paquier.xyz
2020-04-16 06:57:07 +02:00
|
|
|
/*
|
|
|
|
* Backup manifests are supported from version 13.
|
|
|
|
*/
|
|
|
|
#define MINIMUM_VERSION_FOR_MANIFESTS 130000
|
|
|
|
|
2021-11-09 20:21:35 +01:00
|
|
|
/*
|
|
|
|
* Before v15, tar files received from the server will be improperly
|
|
|
|
* terminated.
|
|
|
|
*/
|
|
|
|
#define MINIMUM_VERSION_FOR_TERMINATED_TARFILE 150000
|
|
|
|
|
2023-12-20 15:49:12 +01:00
|
|
|
/*
|
|
|
|
* pg_wal/summaries exists beginning with version 17.
|
|
|
|
*/
|
|
|
|
#define MINIMUM_VERSION_FOR_WAL_SUMMARIES 170000
|
|
|
|
|
2017-01-09 16:03:47 +01:00
|
|
|
/*
|
|
|
|
* Different ways to include WAL
|
|
|
|
*/
|
|
|
|
typedef enum
|
|
|
|
{
|
|
|
|
NO_WAL,
|
|
|
|
FETCH_WAL,
|
|
|
|
STREAM_WAL,
|
|
|
|
} IncludeWal;
|
|
|
|
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
/*
|
|
|
|
* Different places to perform compression
|
|
|
|
*/
|
|
|
|
typedef enum
|
|
|
|
{
|
|
|
|
COMPRESS_LOCATION_UNSPECIFIED,
|
|
|
|
COMPRESS_LOCATION_CLIENT,
|
|
|
|
COMPRESS_LOCATION_SERVER,
|
|
|
|
} CompressionLocation;
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
/* Global options */
|
2013-12-16 10:27:30 +01:00
|
|
|
static char *basedir = NULL;
|
2014-02-22 19:38:06 +01:00
|
|
|
static TablespaceList tablespace_dirs = {NULL, NULL};
|
2017-08-31 04:28:36 +02:00
|
|
|
static char *xlog_dir = NULL;
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
static char format = '\0'; /* p(lain)/t(ar) */
|
2013-12-16 10:27:30 +01:00
|
|
|
static char *label = "pg_basebackup base backup";
|
2016-09-12 18:00:00 +02:00
|
|
|
static bool noclean = false;
|
2018-04-03 13:47:16 +02:00
|
|
|
static bool checksum_failure = false;
|
2013-12-16 10:27:30 +01:00
|
|
|
static bool showprogress = false;
|
2020-03-19 09:09:00 +01:00
|
|
|
static bool estimatesize = true;
|
2013-12-16 10:27:30 +01:00
|
|
|
static int verbose = 0;
|
2017-01-09 16:03:47 +01:00
|
|
|
static IncludeWal includewal = STREAM_WAL;
|
2013-12-16 10:27:30 +01:00
|
|
|
static bool fastcheckpoint = false;
|
|
|
|
static bool writerecoveryconf = false;
|
2016-09-29 18:00:00 +02:00
|
|
|
static bool do_sync = true;
|
2013-12-16 10:27:30 +01:00
|
|
|
static int standby_message_timeout = 10 * 1000; /* 10 sec = default */
|
2014-02-09 12:47:09 +01:00
|
|
|
static pg_time_t last_progress_report = 0;
|
2014-02-27 22:55:57 +01:00
|
|
|
static int32 maxrate = 0; /* no limit by default */
|
2017-01-16 13:56:43 +01:00
|
|
|
static char *replication_slot = NULL;
|
|
|
|
static bool temp_replication_slot = true;
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
static char *backup_target = NULL;
|
2017-09-26 22:07:52 +02:00
|
|
|
static bool create_slot = false;
|
|
|
|
static bool no_slot = false;
|
2018-04-03 13:47:16 +02:00
|
|
|
static bool verify_checksums = true;
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
static bool manifest = true;
|
|
|
|
static bool manifest_force_encode = false;
|
|
|
|
static char *manifest_checksums = NULL;
|
2023-09-07 01:27:00 +02:00
|
|
|
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
|
2014-02-27 22:55:57 +01:00
|
|
|
|
2016-09-12 18:00:00 +02:00
|
|
|
static bool success = false;
|
|
|
|
static bool made_new_pgdata = false;
|
|
|
|
static bool found_existing_pgdata = false;
|
|
|
|
static bool made_new_xlogdir = false;
|
|
|
|
static bool found_existing_xlogdir = false;
|
|
|
|
static bool made_tablespace_dirs = false;
|
|
|
|
static bool found_tablespace_dirs = false;
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/* Progress indicators */
|
2019-09-03 11:59:36 +02:00
|
|
|
static uint64 totalsize_kb;
|
2011-01-23 12:21:23 +01:00
|
|
|
static uint64 totaldone;
|
|
|
|
static int tablespacecount;
|
2022-02-17 21:03:40 +01:00
|
|
|
static char *progress_filename = NULL;
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
/* Pipe to communicate with background wal receiver process */
|
|
|
|
#ifndef WIN32
|
|
|
|
static int bgpipe[2] = {-1, -1};
|
|
|
|
#endif
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
/* Handle to child process */
|
|
|
|
static pid_t bgchild = -1;
|
2016-09-12 18:00:00 +02:00
|
|
|
static bool in_log_streamer = false;
|
2022-05-12 21:17:30 +02:00
|
|
|
|
2022-02-23 14:24:43 +01:00
|
|
|
/* Flag to indicate if child process exited unexpectedly */
|
|
|
|
static volatile sig_atomic_t bgchild_exited = false;
|
2011-10-26 20:13:33 +02:00
|
|
|
|
|
|
|
/* End position for xlog streaming, empty string if unknown yet */
|
|
|
|
static XLogRecPtr xlogendptr;
|
2012-06-10 21:20:04 +02:00
|
|
|
|
2011-12-11 00:15:15 +01:00
|
|
|
#ifndef WIN32
|
2011-10-26 20:13:33 +02:00
|
|
|
static int has_xlogendptr = 0;
|
2011-12-11 00:15:15 +01:00
|
|
|
#else
|
|
|
|
static volatile LONG has_xlogendptr = 0;
|
|
|
|
#endif
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2018-11-25 16:31:16 +01:00
|
|
|
/* Contents of configuration file to be generated */
|
2013-01-05 16:54:06 +01:00
|
|
|
static PQExpBuffer recoveryconfcontents = NULL;
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
/* Function headers */
|
|
|
|
static void usage(void);
|
2016-09-12 18:00:00 +02:00
|
|
|
static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
static void progress_update_filename(const char *filename);
|
|
|
|
static void progress_report(int tablespacenum, bool force, bool finished);
|
|
|
|
|
|
|
|
static bbstreamer *CreateBackupStreamer(char *archive_name, char *spclocation,
|
|
|
|
bbstreamer **manifest_inject_streamer_p,
|
2021-11-09 20:21:35 +01:00
|
|
|
bool is_recovery_guc_supported,
|
2022-03-23 14:19:14 +01:00
|
|
|
bool expect_unterminated_tarfile,
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
pg_compress_specification *compress);
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
static void ReceiveArchiveStreamChunk(size_t r, char *copybuf,
|
|
|
|
void *callback_data);
|
|
|
|
static char GetCopyDataByte(size_t r, char *copybuf, size_t *cursor);
|
|
|
|
static char *GetCopyDataString(size_t r, char *copybuf, size_t *cursor);
|
|
|
|
static uint64 GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor);
|
|
|
|
static void GetCopyDataEnd(size_t r, char *copybuf, size_t cursor);
|
|
|
|
static void ReportCopyDataParseError(size_t r, char *copybuf);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
static void ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
bool tablespacenum, pg_compress_specification *compress);
|
2019-12-05 21:14:09 +01:00
|
|
|
static void ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data);
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
static void ReceiveBackupManifest(PGconn *conn);
|
|
|
|
static void ReceiveBackupManifestChunk(size_t r, char *copybuf,
|
|
|
|
void *callback_data);
|
|
|
|
static void ReceiveBackupManifestInMemory(PGconn *conn, PQExpBuffer buf);
|
|
|
|
static void ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
|
|
|
|
void *callback_data);
|
2022-03-23 14:19:14 +01:00
|
|
|
static void BaseBackup(char *compression_algorithm, char *compression_detail,
|
|
|
|
CompressionLocation compressloc,
|
2023-12-20 15:49:12 +01:00
|
|
|
pg_compress_specification *client_compress,
|
|
|
|
char *incremental_manifest);
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2012-07-31 16:11:11 +02:00
|
|
|
static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline,
|
|
|
|
bool segment_finished);
|
2011-10-26 20:13:33 +02:00
|
|
|
|
2014-02-22 19:38:06 +01:00
|
|
|
static const char *get_tablespace_mapping(const char *dir);
|
|
|
|
static void tablespace_list_append(const char *arg);
|
|
|
|
|
2014-02-09 13:10:14 +01:00
|
|
|
|
2016-09-12 18:00:00 +02:00
|
|
|
static void
|
|
|
|
cleanup_directories_atexit(void)
|
|
|
|
{
|
|
|
|
if (success || in_log_streamer)
|
|
|
|
return;
|
|
|
|
|
2018-04-03 13:47:16 +02:00
|
|
|
if (!noclean && !checksum_failure)
|
2016-09-12 18:00:00 +02:00
|
|
|
{
|
|
|
|
if (made_new_pgdata)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("removing data directory \"%s\"", basedir);
|
2016-09-12 18:00:00 +02:00
|
|
|
if (!rmtree(basedir, true))
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("failed to remove data directory");
|
2016-09-12 18:00:00 +02:00
|
|
|
}
|
|
|
|
else if (found_existing_pgdata)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("removing contents of data directory \"%s\"", basedir);
|
2016-09-12 18:00:00 +02:00
|
|
|
if (!rmtree(basedir, false))
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("failed to remove contents of data directory");
|
2016-09-12 18:00:00 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (made_new_xlogdir)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("removing WAL directory \"%s\"", xlog_dir);
|
2016-09-12 18:00:00 +02:00
|
|
|
if (!rmtree(xlog_dir, true))
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("failed to remove WAL directory");
|
2016-09-12 18:00:00 +02:00
|
|
|
}
|
|
|
|
else if (found_existing_xlogdir)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("removing contents of WAL directory \"%s\"", xlog_dir);
|
2016-09-12 18:00:00 +02:00
|
|
|
if (!rmtree(xlog_dir, false))
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("failed to remove contents of WAL directory");
|
2016-09-12 18:00:00 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2018-04-03 13:47:16 +02:00
|
|
|
if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("data directory \"%s\" not removed at user's request", basedir);
|
2016-09-12 18:00:00 +02:00
|
|
|
|
|
|
|
if (made_new_xlogdir || found_existing_xlogdir)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("WAL directory \"%s\" not removed at user's request", xlog_dir);
|
2016-09-12 18:00:00 +02:00
|
|
|
}
|
|
|
|
|
2018-04-03 13:47:16 +02:00
|
|
|
if ((made_tablespace_dirs || found_tablespace_dirs) && !checksum_failure)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("changes to tablespace directories will not be undone");
|
2016-09-12 18:00:00 +02:00
|
|
|
}
|
|
|
|
|
2014-02-09 13:10:14 +01:00
|
|
|
static void
|
2018-12-29 13:21:47 +01:00
|
|
|
disconnect_atexit(void)
|
2014-02-09 13:10:14 +01:00
|
|
|
{
|
|
|
|
if (conn != NULL)
|
|
|
|
PQfinish(conn);
|
2018-12-29 13:21:47 +01:00
|
|
|
}
|
2014-02-09 13:10:14 +01:00
|
|
|
|
|
|
|
#ifndef WIN32
|
2022-02-23 14:24:43 +01:00
|
|
|
/*
|
|
|
|
* If the bgchild exits prematurely and raises a SIGCHLD signal, we can abort
|
|
|
|
* processing rather than wait until the backup has finished and error out at
|
|
|
|
* that time. On Windows, we use a background thread which can communicate
|
|
|
|
* without the need for a signal handler.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
sigchld_handler(SIGNAL_ARGS)
|
|
|
|
{
|
|
|
|
bgchild_exited = true;
|
|
|
|
}
|
|
|
|
|
2018-12-29 13:21:47 +01:00
|
|
|
/*
|
|
|
|
* On windows, our background thread dies along with the process. But on
|
|
|
|
* Unix, if we have started a subprocess, we want to kill it off so it
|
|
|
|
* doesn't remain running trying to stream data.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
kill_bgchild_atexit(void)
|
|
|
|
{
|
2022-02-23 14:24:43 +01:00
|
|
|
if (bgchild > 0 && !bgchild_exited)
|
2014-02-09 13:10:14 +01:00
|
|
|
kill(bgchild, SIGTERM);
|
|
|
|
}
|
2018-12-29 13:21:47 +01:00
|
|
|
#endif
|
2014-02-09 13:10:14 +01:00
|
|
|
|
2014-02-22 19:38:06 +01:00
|
|
|
/*
|
|
|
|
* Split argument into old_dir and new_dir and append to tablespace mapping
|
|
|
|
* list.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
tablespace_list_append(const char *arg)
|
|
|
|
{
|
|
|
|
TablespaceListCell *cell = (TablespaceListCell *) pg_malloc0(sizeof(TablespaceListCell));
|
|
|
|
char *dst;
|
|
|
|
char *dst_ptr;
|
|
|
|
const char *arg_ptr;
|
|
|
|
|
|
|
|
dst_ptr = dst = cell->old_dir;
|
|
|
|
for (arg_ptr = arg; *arg_ptr; arg_ptr++)
|
|
|
|
{
|
|
|
|
if (dst_ptr - dst >= MAXPGPATH)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("directory name too long");
|
2014-02-22 19:38:06 +01:00
|
|
|
|
|
|
|
if (*arg_ptr == '\\' && *(arg_ptr + 1) == '=')
|
|
|
|
; /* skip backslash escaping = */
|
|
|
|
else if (*arg_ptr == '=' && (arg_ptr == arg || *(arg_ptr - 1) != '\\'))
|
|
|
|
{
|
|
|
|
if (*cell->new_dir)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("multiple \"=\" signs in tablespace mapping");
|
2014-02-22 19:38:06 +01:00
|
|
|
else
|
|
|
|
dst = dst_ptr = cell->new_dir;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
*dst_ptr++ = *arg_ptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!*cell->old_dir || !*cell->new_dir)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("invalid tablespace mapping format \"%s\", must be \"OLDDIR=NEWDIR\"", arg);
|
2014-02-22 19:38:06 +01:00
|
|
|
|
|
|
|
/*
|
2022-10-21 14:21:55 +02:00
|
|
|
* All tablespaces are created with absolute directories, so specifying a
|
|
|
|
* non-absolute path here would just never match, possibly confusing
|
|
|
|
* users. Since we don't know whether the remote side is Windows or not,
|
|
|
|
* and it might be different than the local side, permit any path that
|
|
|
|
* could be absolute under either set of rules.
|
|
|
|
*
|
|
|
|
* (There is little practical risk of confusion here, because someone
|
|
|
|
* running entirely on Linux isn't likely to have a relative path that
|
|
|
|
* begins with a backslash or something that looks like a drive
|
|
|
|
* specification. If they do, and they also incorrectly believe that a
|
|
|
|
* relative path is acceptable here, we'll silently fail to warn them of
|
|
|
|
* their mistake, and the -T option will just not get applied, same as if
|
|
|
|
* they'd specified -T for a nonexistent tablespace.)
|
2014-02-22 19:38:06 +01:00
|
|
|
*/
|
2022-10-21 14:21:55 +02:00
|
|
|
if (!is_nonwindows_absolute_path(cell->old_dir) &&
|
|
|
|
!is_windows_absolute_path(cell->old_dir))
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("old directory is not an absolute path in tablespace mapping: %s",
|
|
|
|
cell->old_dir);
|
2014-02-22 19:38:06 +01:00
|
|
|
|
|
|
|
if (!is_absolute_path(cell->new_dir))
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("new directory is not an absolute path in tablespace mapping: %s",
|
|
|
|
cell->new_dir);
|
2014-02-22 19:38:06 +01:00
|
|
|
|
2017-11-01 15:20:05 +01:00
|
|
|
/*
|
|
|
|
* Comparisons done with these values should involve similarly
|
|
|
|
* canonicalized path values. This is particularly sensitive on Windows
|
|
|
|
* where path values may not necessarily use Unix slashes.
|
|
|
|
*/
|
2015-04-29 02:12:10 +02:00
|
|
|
canonicalize_path(cell->old_dir);
|
|
|
|
canonicalize_path(cell->new_dir);
|
|
|
|
|
2014-02-22 19:38:06 +01:00
|
|
|
if (tablespace_dirs.tail)
|
|
|
|
tablespace_dirs.tail->next = cell;
|
|
|
|
else
|
|
|
|
tablespace_dirs.head = cell;
|
|
|
|
tablespace_dirs.tail = cell;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
static void
|
|
|
|
usage(void)
|
|
|
|
{
|
2011-05-04 19:56:52 +02:00
|
|
|
printf(_("%s takes a base backup of a running PostgreSQL server.\n\n"),
|
2011-01-23 12:21:23 +01:00
|
|
|
progname);
|
|
|
|
printf(_("Usage:\n"));
|
|
|
|
printf(_(" %s [OPTION]...\n"), progname);
|
|
|
|
printf(_("\nOptions controlling the output:\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -D, --pgdata=DIRECTORY receive base backup into directory\n"));
|
|
|
|
printf(_(" -F, --format=p|t output format (plain (default), tar)\n"));
|
2023-12-26 10:54:36 +01:00
|
|
|
printf(_(" -i, --incremental=OLDMANIFEST\n"
|
|
|
|
" take incremental backup\n"));
|
2017-05-17 19:04:03 +02:00
|
|
|
printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
|
|
|
|
" (in kB/s, or use suffix \"k\" or \"M\")\n"));
|
|
|
|
printf(_(" -R, --write-recovery-conf\n"
|
2018-11-25 16:31:16 +01:00
|
|
|
" write configuration for replication\n"));
|
2022-04-11 07:39:25 +02:00
|
|
|
printf(_(" -t, --target=TARGET[:DETAIL]\n"
|
|
|
|
" backup target (if other than client)\n"));
|
2017-05-17 19:04:03 +02:00
|
|
|
printf(_(" -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
|
|
|
|
" relocate tablespace in OLDDIR to NEWDIR\n"));
|
2017-09-26 17:58:22 +02:00
|
|
|
printf(_(" --waldir=WALDIR location for the write-ahead log directory\n"));
|
2017-05-17 19:04:03 +02:00
|
|
|
printf(_(" -X, --wal-method=none|fetch|stream\n"
|
|
|
|
" include required WAL files with specified method\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -z, --gzip compress tar output\n"));
|
2022-03-23 14:19:14 +01:00
|
|
|
printf(_(" -Z, --compress=[{client|server}-]METHOD[:DETAIL]\n"
|
|
|
|
" compress on client or server as specified\n"));
|
2022-03-07 21:08:45 +01:00
|
|
|
printf(_(" -Z, --compress=none do not compress tar output\n"));
|
2011-01-23 12:21:23 +01:00
|
|
|
printf(_("\nGeneral options:\n"));
|
2017-05-17 19:04:03 +02:00
|
|
|
printf(_(" -c, --checkpoint=fast|spread\n"
|
2024-01-10 13:30:10 +01:00
|
|
|
" set fast or spread (default) checkpointing\n"));
|
2017-09-26 22:07:52 +02:00
|
|
|
printf(_(" -C, --create-slot create replication slot\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -l, --label=LABEL set backup label\n"));
|
2016-10-19 18:00:00 +02:00
|
|
|
printf(_(" -n, --no-clean do not clean up after errors\n"));
|
|
|
|
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -P, --progress show progress information\n"));
|
2017-09-26 17:58:22 +02:00
|
|
|
printf(_(" -S, --slot=SLOTNAME replication slot to use\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -v, --verbose output verbose messages\n"));
|
|
|
|
printf(_(" -V, --version output version information, then exit\n"));
|
2020-05-01 11:49:52 +02:00
|
|
|
printf(_(" --manifest-checksums=SHA{224,256,384,512}|CRC32C|NONE\n"
|
|
|
|
" use algorithm for manifest checksums\n"));
|
|
|
|
printf(_(" --manifest-force-encode\n"
|
|
|
|
" hex encode all file names in manifest\n"));
|
|
|
|
printf(_(" --no-estimate-size do not estimate backup size in server side\n"));
|
|
|
|
printf(_(" --no-manifest suppress generation of backup manifest\n"));
|
2018-05-21 16:01:49 +02:00
|
|
|
printf(_(" --no-slot prevent creation of temporary replication slot\n"));
|
|
|
|
printf(_(" --no-verify-checksums\n"
|
|
|
|
" do not verify checksums\n"));
|
Allow using syncfs() in frontend utilities.
This commit allows specifying a --sync-method in several frontend
utilities that must synchronize many files to disk (initdb,
pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade).
On Linux, users can specify "syncfs" to synchronize the relevant
file systems instead of calling fsync() for every single file. In
many cases, using syncfs() is much faster.
As with recovery_init_sync_method, this new option comes with some
caveats. The descriptions of these caveats have been moved to a
new appendix section in the documentation.
Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier, Thomas Munro, Robert Haas, Justin Pryzby
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
2023-09-07 01:27:16 +02:00
|
|
|
printf(_(" --sync-method=METHOD\n"
|
|
|
|
" set method for syncing files to disk\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -?, --help show this help, then exit\n"));
|
2011-01-23 12:21:23 +01:00
|
|
|
printf(_("\nConnection options:\n"));
|
2013-02-25 13:48:27 +01:00
|
|
|
printf(_(" -d, --dbname=CONNSTR connection string\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
|
|
|
|
printf(_(" -p, --port=PORT database server port number\n"));
|
2017-05-17 19:04:03 +02:00
|
|
|
printf(_(" -s, --status-interval=INTERVAL\n"
|
|
|
|
" time between status packets sent to server (in seconds)\n"));
|
2012-07-31 16:11:11 +02:00
|
|
|
printf(_(" -U, --username=NAME connect as specified database user\n"));
|
|
|
|
printf(_(" -w, --no-password never prompt for password\n"));
|
|
|
|
printf(_(" -W, --password force password prompt (should happen automatically)\n"));
|
2020-02-28 08:54:49 +01:00
|
|
|
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
|
2020-02-28 08:54:49 +01:00
|
|
|
printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
/*
|
2012-05-25 11:36:22 +02:00
|
|
|
* Called in the background process every time data is received.
|
2011-10-26 20:13:33 +02:00
|
|
|
* On Unix, we check to see if there is any data on our pipe
|
|
|
|
* (which would mean we have a stop position), and if it is, check if
|
|
|
|
* it is time to stop.
|
|
|
|
* On Windows, we are in a single process, so we can just check if it's
|
|
|
|
* time to stop.
|
|
|
|
*/
|
|
|
|
static bool
|
2012-07-31 16:11:11 +02:00
|
|
|
reached_end_position(XLogRecPtr segendpos, uint32 timeline,
|
|
|
|
bool segment_finished)
|
2011-10-26 20:13:33 +02:00
|
|
|
{
|
|
|
|
if (!has_xlogendptr)
|
|
|
|
{
|
|
|
|
#ifndef WIN32
|
|
|
|
fd_set fds;
|
2022-07-16 08:42:15 +02:00
|
|
|
struct timeval tv = {0};
|
2011-10-26 20:13:33 +02:00
|
|
|
int r;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Don't have the end pointer yet - check our pipe to see if it has
|
|
|
|
* been sent yet.
|
|
|
|
*/
|
|
|
|
FD_ZERO(&fds);
|
|
|
|
FD_SET(bgpipe[0], &fds);
|
|
|
|
|
|
|
|
r = select(bgpipe[0] + 1, &fds, NULL, NULL, &tv);
|
|
|
|
if (r == 1)
|
|
|
|
{
|
2022-07-16 08:42:15 +02:00
|
|
|
char xlogend[64] = {0};
|
2012-06-24 17:51:37 +02:00
|
|
|
uint32 hi,
|
|
|
|
lo;
|
2011-10-26 20:13:33 +02:00
|
|
|
|
2013-07-15 16:42:27 +02:00
|
|
|
r = read(bgpipe[0], xlogend, sizeof(xlogend) - 1);
|
2011-10-26 20:13:33 +02:00
|
|
|
if (r < 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not read from ready pipe: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
|
2012-06-24 17:51:37 +02:00
|
|
|
if (sscanf(xlogend, "%X/%X", &hi, &lo) != 2)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not parse write-ahead log location \"%s\"",
|
|
|
|
xlogend);
|
2012-06-24 17:51:37 +02:00
|
|
|
xlogendptr = ((uint64) hi) << 32 | lo;
|
2011-10-26 20:13:33 +02:00
|
|
|
has_xlogendptr = 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fall through to check if we've reached the point further
|
|
|
|
* already.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* No data received on the pipe means we don't know the end
|
|
|
|
* position yet - so just say it's not time to stop yet.
|
|
|
|
*/
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
|
|
|
|
/*
|
|
|
|
* On win32, has_xlogendptr is set by the main thread, so if it's not
|
|
|
|
* set here, we just go back and wait until it shows up.
|
|
|
|
*/
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* At this point we have an end pointer, so compare it to the current
|
|
|
|
* position to figure out if it's time to stop.
|
|
|
|
*/
|
2012-06-24 17:51:37 +02:00
|
|
|
if (segendpos >= xlogendptr)
|
2011-10-26 20:13:33 +02:00
|
|
|
return true;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Have end pointer, but haven't reached it yet - so tell the caller to
|
|
|
|
* keep streaming.
|
|
|
|
*/
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
PGconn *bgconn;
|
|
|
|
XLogRecPtr startptr;
|
2016-10-23 15:16:31 +02:00
|
|
|
char xlog[MAXPGPATH]; /* directory or tarfile depending on mode */
|
2011-10-26 20:13:33 +02:00
|
|
|
char *sysidentifier;
|
|
|
|
int timeline;
|
2022-04-12 10:28:17 +02:00
|
|
|
pg_compress_algorithm wal_compress_algorithm;
|
2022-03-23 18:25:26 +01:00
|
|
|
int wal_compress_level;
|
2011-10-26 20:13:33 +02:00
|
|
|
} logstreamer_param;
|
|
|
|
|
|
|
|
static int
|
2022-03-23 18:25:26 +01:00
|
|
|
LogStreamerMain(logstreamer_param *param)
|
2011-10-26 20:13:33 +02:00
|
|
|
{
|
2022-07-16 08:42:15 +02:00
|
|
|
StreamCtl stream = {0};
|
2016-03-11 11:08:01 +01:00
|
|
|
|
2016-09-12 18:00:00 +02:00
|
|
|
in_log_streamer = true;
|
|
|
|
|
2016-03-11 11:08:01 +01:00
|
|
|
stream.startpos = param->startptr;
|
|
|
|
stream.timeline = param->timeline;
|
|
|
|
stream.sysidentifier = param->sysidentifier;
|
|
|
|
stream.stream_stop = reached_end_position;
|
2017-04-28 00:27:02 +02:00
|
|
|
#ifndef WIN32
|
|
|
|
stream.stop_socket = bgpipe[0];
|
|
|
|
#else
|
|
|
|
stream.stop_socket = PGINVALID_SOCKET;
|
|
|
|
#endif
|
2016-03-11 11:08:01 +01:00
|
|
|
stream.standby_message_timeout = standby_message_timeout;
|
|
|
|
stream.synchronous = false;
|
2019-09-04 06:21:11 +02:00
|
|
|
/* fsync happens at the end of pg_basebackup for all data */
|
|
|
|
stream.do_sync = false;
|
2016-03-11 11:08:01 +01:00
|
|
|
stream.mark_done = true;
|
|
|
|
stream.partial_suffix = NULL;
|
2017-01-16 13:56:43 +01:00
|
|
|
stream.replication_slot = replication_slot;
|
2016-10-23 15:16:31 +02:00
|
|
|
if (format == 'p')
|
Rework compression options of pg_receivewal
pg_receivewal includes since cada1af the option --compress, to allow the
compression of WAL segments using gzip, with a value of 0 (the default)
meaning that no compression can be used.
This commit introduces a new option, called --compression-method, able
to use as values "none", the default, and "gzip", to make things more
extensible. The case of --compress=0 becomes fuzzy with this option
layer, so we have made the choice to make pg_receivewal return an error
when using "none" and a non-zero compression level, meaning that the
authorized values of --compress are now [1,9] instead of [0,9]. Not
specifying --compress with "gzip" as compression method makes
pg_receivewal use the default of zlib instead (Z_DEFAULT_COMPRESSION).
The code in charge of finding the streaming start LSN when scanning the
existing archives is refactored and made more extensible. While on it,
rename "compression" to "compression_level" in walmethods.c, to reduce
the confusion with the introduction of the compression method, even if
the tar method used by pg_basebackup does not rely on the compression
method (yet, at least), but just on the compression level (this area
could be improved more, actually).
This is in preparation for an upcoming patch that adds LZ4 support to
pg_receivewal.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Jian Guo, Magnus Hagander, Dilip Kumar,
Robert Haas
Discussion: https://postgr.es/m/ZCm1J5vfyQ2E6dYvXz8si39HQ2gwxSZ3IpYaVgYa3lUwY88SLapx9EEnOf5uEwrddhx2twG7zYKjVeuP5MwZXCNPybtsGouDsAD1o2L_I5E=@pm.me
2021-11-04 03:10:31 +01:00
|
|
|
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
|
2022-04-12 10:28:17 +02:00
|
|
|
PG_COMPRESSION_NONE, 0,
|
2019-09-04 06:21:11 +02:00
|
|
|
stream.do_sync);
|
2022-02-11 15:41:42 +01:00
|
|
|
else
|
|
|
|
stream.walmethod = CreateWalTarMethod(param->xlog,
|
2022-04-12 10:28:17 +02:00
|
|
|
param->wal_compress_algorithm,
|
2022-03-23 18:25:26 +01:00
|
|
|
param->wal_compress_level,
|
2022-02-11 15:41:42 +01:00
|
|
|
stream.do_sync);
|
2016-10-23 15:16:31 +02:00
|
|
|
|
2016-03-11 11:08:01 +01:00
|
|
|
if (!ReceiveXlogStream(param->bgconn, &stream))
|
2022-02-23 14:24:43 +01:00
|
|
|
{
|
2011-10-26 20:13:33 +02:00
|
|
|
/*
|
|
|
|
* Any errors will already have been reported in the function process,
|
|
|
|
* but we need to tell the parent that we didn't shutdown in a nice
|
|
|
|
* way.
|
|
|
|
*/
|
2022-02-23 14:24:43 +01:00
|
|
|
#ifdef WIN32
|
|
|
|
/*
|
|
|
|
* In order to signal the main thread of an ungraceful exit we set the
|
|
|
|
* same flag that we use on Unix to signal SIGCHLD.
|
|
|
|
*/
|
|
|
|
bgchild_exited = true;
|
|
|
|
#endif
|
2011-10-26 20:13:33 +02:00
|
|
|
return 1;
|
2022-02-23 14:24:43 +01:00
|
|
|
}
|
2011-10-26 20:13:33 +02:00
|
|
|
|
2022-09-19 18:53:46 +02:00
|
|
|
if (!stream.walmethod->ops->finish(stream.walmethod))
|
2016-10-23 15:16:31 +02:00
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("could not finish writing WAL files: %m");
|
2022-02-23 14:24:43 +01:00
|
|
|
#ifdef WIN32
|
|
|
|
bgchild_exited = true;
|
|
|
|
#endif
|
2016-10-23 15:16:31 +02:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
PQfinish(param->bgconn);
|
2016-10-25 18:57:56 +02:00
|
|
|
|
2022-09-19 18:53:46 +02:00
|
|
|
stream.walmethod->ops->free(stream.walmethod);
|
2016-10-25 18:57:56 +02:00
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initiate background process for receiving xlog during the backup.
|
|
|
|
* The background stream will use its own database connection so we can
|
|
|
|
* stream the logfile in parallel with the backups.
|
|
|
|
*/
|
|
|
|
static void
|
2022-03-23 14:19:14 +01:00
|
|
|
StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier,
|
2022-04-12 10:28:17 +02:00
|
|
|
pg_compress_algorithm wal_compress_algorithm,
|
2022-03-23 14:19:14 +01:00
|
|
|
int wal_compress_level)
|
2011-10-26 20:13:33 +02:00
|
|
|
{
|
|
|
|
logstreamer_param *param;
|
2012-06-24 17:51:37 +02:00
|
|
|
uint32 hi,
|
|
|
|
lo;
|
2015-01-03 20:51:52 +01:00
|
|
|
char statusdir[MAXPGPATH];
|
2011-10-26 20:13:33 +02:00
|
|
|
|
2012-10-02 21:35:10 +02:00
|
|
|
param = pg_malloc0(sizeof(logstreamer_param));
|
2011-10-26 20:13:33 +02:00
|
|
|
param->timeline = timeline;
|
|
|
|
param->sysidentifier = sysidentifier;
|
2022-04-12 10:28:17 +02:00
|
|
|
param->wal_compress_algorithm = wal_compress_algorithm;
|
2022-03-23 18:25:26 +01:00
|
|
|
param->wal_compress_level = wal_compress_level;
|
2011-10-26 20:13:33 +02:00
|
|
|
|
|
|
|
/* Convert the starting position */
|
2012-06-24 17:51:37 +02:00
|
|
|
if (sscanf(startpos, "%X/%X", &hi, &lo) != 2)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not parse write-ahead log location \"%s\"",
|
|
|
|
startpos);
|
2012-06-24 17:51:37 +02:00
|
|
|
param->startptr = ((uint64) hi) << 32 | lo;
|
2011-10-26 20:13:33 +02:00
|
|
|
/* Round off to even segment position */
|
Make WAL segment size configurable at initdb time.
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-20 07:03:48 +02:00
|
|
|
param->startptr -= XLogSegmentOffset(param->startptr, WalSegSz);
|
2011-10-26 20:13:33 +02:00
|
|
|
|
|
|
|
#ifndef WIN32
|
|
|
|
/* Create our background pipe */
|
2012-03-29 05:24:07 +02:00
|
|
|
if (pipe(bgpipe) < 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create pipe for background process: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Get a second connection */
|
|
|
|
param->bgconn = GetConnection();
|
2012-05-27 11:05:24 +02:00
|
|
|
if (!param->bgconn)
|
|
|
|
/* Error message already written in GetConnection() */
|
|
|
|
exit(1);
|
2011-10-26 20:13:33 +02:00
|
|
|
|
2016-10-20 17:24:37 +02:00
|
|
|
/* In post-10 cluster, pg_xlog has been renamed to pg_wal */
|
2016-10-23 15:16:31 +02:00
|
|
|
snprintf(param->xlog, sizeof(param->xlog), "%s/%s",
|
2016-10-20 17:24:37 +02:00
|
|
|
basedir,
|
|
|
|
PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
|
|
|
|
"pg_xlog" : "pg_wal");
|
2015-01-03 20:51:52 +01:00
|
|
|
|
2017-01-16 13:56:43 +01:00
|
|
|
/* Temporary replication slots are only supported in 10 and newer */
|
|
|
|
if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_TEMP_SLOTS)
|
2017-09-26 22:07:52 +02:00
|
|
|
temp_replication_slot = false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create replication slot if requested
|
|
|
|
*/
|
|
|
|
if (temp_replication_slot && !replication_slot)
|
2023-09-07 07:12:18 +02:00
|
|
|
replication_slot = psprintf("pg_basebackup_%u",
|
|
|
|
(unsigned int) PQbackendPID(param->bgconn));
|
2017-09-26 22:07:52 +02:00
|
|
|
if (temp_replication_slot || create_slot)
|
|
|
|
{
|
|
|
|
if (!CreateReplicationSlot(param->bgconn, replication_slot, NULL,
|
2021-06-30 05:15:47 +02:00
|
|
|
temp_replication_slot, true, true, false, false))
|
2018-12-29 13:21:47 +01:00
|
|
|
exit(1);
|
2017-09-26 22:07:52 +02:00
|
|
|
|
|
|
|
if (verbose)
|
|
|
|
{
|
|
|
|
if (temp_replication_slot)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("created temporary replication slot \"%s\"",
|
|
|
|
replication_slot);
|
2017-09-26 22:07:52 +02:00
|
|
|
else
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("created replication slot \"%s\"",
|
|
|
|
replication_slot);
|
2017-09-26 22:07:52 +02:00
|
|
|
}
|
|
|
|
}
|
2015-01-03 20:51:52 +01:00
|
|
|
|
2016-10-23 15:16:31 +02:00
|
|
|
if (format == 'p')
|
2015-01-03 20:51:52 +01:00
|
|
|
{
|
2016-10-23 15:16:31 +02:00
|
|
|
/*
|
|
|
|
* Create pg_wal/archive_status or pg_xlog/archive_status (and thus
|
|
|
|
* pg_wal or pg_xlog) depending on the target server so we can write
|
|
|
|
* to basedir/pg_wal or basedir/pg_xlog as the directory entry in the
|
|
|
|
* tar file may arrive later.
|
|
|
|
*/
|
|
|
|
snprintf(statusdir, sizeof(statusdir), "%s/%s/archive_status",
|
|
|
|
basedir,
|
|
|
|
PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
|
|
|
|
"pg_xlog" : "pg_wal");
|
|
|
|
|
2018-04-07 23:45:39 +02:00
|
|
|
if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno != EEXIST)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create directory \"%s\": %m", statusdir);
|
2023-12-20 15:49:12 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For newer server versions, likewise create pg_wal/summaries
|
|
|
|
*/
|
2024-02-04 02:51:53 +01:00
|
|
|
if (PQserverVersion(conn) >= MINIMUM_VERSION_FOR_WAL_SUMMARIES)
|
2023-12-20 15:49:12 +01:00
|
|
|
{
|
|
|
|
char summarydir[MAXPGPATH];
|
|
|
|
|
|
|
|
snprintf(summarydir, sizeof(summarydir), "%s/%s/summaries",
|
2024-02-04 02:51:53 +01:00
|
|
|
basedir, "pg_wal");
|
2023-12-20 15:49:12 +01:00
|
|
|
|
2024-01-11 19:06:10 +01:00
|
|
|
if (pg_mkdir_p(summarydir, pg_dir_create_mode) != 0 &&
|
2023-12-20 15:49:12 +01:00
|
|
|
errno != EEXIST)
|
|
|
|
pg_fatal("could not create directory \"%s\": %m", summarydir);
|
|
|
|
}
|
2015-01-03 20:51:52 +01:00
|
|
|
}
|
2011-10-26 20:13:33 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Start a child process and tell it to start streaming. On Unix, this is
|
|
|
|
* a fork(). On Windows, we create a thread.
|
|
|
|
*/
|
|
|
|
#ifndef WIN32
|
|
|
|
bgchild = fork();
|
|
|
|
if (bgchild == 0)
|
|
|
|
{
|
|
|
|
/* in child process */
|
2022-07-05 20:01:10 +02:00
|
|
|
exit(LogStreamerMain(param));
|
2011-10-26 20:13:33 +02:00
|
|
|
}
|
|
|
|
else if (bgchild < 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create background process: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Else we are in the parent process and all is well.
|
|
|
|
*/
|
2018-12-29 13:21:47 +01:00
|
|
|
atexit(kill_bgchild_atexit);
|
2011-10-26 20:13:33 +02:00
|
|
|
#else /* WIN32 */
|
|
|
|
bgchild = _beginthreadex(NULL, 0, (void *) LogStreamerMain, param, 0, NULL);
|
|
|
|
if (bgchild == 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create background thread: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
/*
|
|
|
|
* Verify that the given directory exists and is empty. If it does not
|
|
|
|
* exist, it is created. If it exists but is not empty, an error will
|
2017-05-15 11:08:02 +02:00
|
|
|
* be given and the process ended.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
|
|
|
static void
|
2016-09-12 18:00:00 +02:00
|
|
|
verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
|
|
|
switch (pg_check_dir(dirname))
|
|
|
|
{
|
|
|
|
case 0:
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Does not exist, so create
|
|
|
|
*/
|
2018-04-07 23:45:39 +02:00
|
|
|
if (pg_mkdir_p(dirname, pg_dir_create_mode) == -1)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create directory \"%s\": %m", dirname);
|
2016-09-12 18:00:00 +02:00
|
|
|
if (created)
|
|
|
|
*created = true;
|
2011-01-23 12:21:23 +01:00
|
|
|
return;
|
|
|
|
case 1:
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exists, empty
|
|
|
|
*/
|
2016-09-12 18:00:00 +02:00
|
|
|
if (found)
|
|
|
|
*found = true;
|
2011-01-23 12:21:23 +01:00
|
|
|
return;
|
|
|
|
case 2:
|
2013-02-17 00:52:50 +01:00
|
|
|
case 3:
|
|
|
|
case 4:
|
2011-01-23 12:21:23 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Exists, not empty
|
|
|
|
*/
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("directory \"%s\" exists but is not empty", dirname);
|
2011-01-23 12:21:23 +01:00
|
|
|
case -1:
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Access problem
|
|
|
|
*/
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not access directory \"%s\": %m", dirname);
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/*
|
|
|
|
* Callback to update our notion of the current filename.
|
2022-02-17 21:03:40 +01:00
|
|
|
*
|
|
|
|
* No other code should modify progress_filename!
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
progress_update_filename(const char *filename)
|
|
|
|
{
|
2022-02-17 21:03:40 +01:00
|
|
|
/* We needn't maintain this variable if not doing verbose reports. */
|
|
|
|
if (showprogress && verbose)
|
|
|
|
{
|
2022-06-16 21:50:56 +02:00
|
|
|
free(progress_filename);
|
2022-02-17 21:03:40 +01:00
|
|
|
if (filename)
|
|
|
|
progress_filename = pg_strdup(filename);
|
|
|
|
else
|
|
|
|
progress_filename = NULL;
|
|
|
|
}
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Print a progress report based on the global variables. If verbose output
|
|
|
|
* is enabled, also print the current file name.
|
2014-02-09 12:47:09 +01:00
|
|
|
*
|
2020-08-17 08:27:29 +02:00
|
|
|
* Progress report is written at maximum once per second, unless the force
|
|
|
|
* parameter is set to true.
|
|
|
|
*
|
|
|
|
* If finished is set to true, this is the last progress report. The cursor
|
|
|
|
* is moved to the next line.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
|
|
|
static void
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
progress_report(int tablespacenum, bool force, bool finished)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
2014-02-09 12:47:09 +01:00
|
|
|
int percent;
|
2011-08-16 10:24:08 +02:00
|
|
|
char totaldone_str[32];
|
|
|
|
char totalsize_str[32];
|
2014-02-09 12:47:09 +01:00
|
|
|
pg_time_t now;
|
|
|
|
|
|
|
|
if (!showprogress)
|
|
|
|
return;
|
|
|
|
|
|
|
|
now = time(NULL);
|
2020-08-17 08:27:29 +02:00
|
|
|
if (now == last_progress_report && !force && !finished)
|
2014-02-09 12:47:09 +01:00
|
|
|
return; /* Max once per second */
|
|
|
|
|
|
|
|
last_progress_report = now;
|
2019-09-03 11:59:36 +02:00
|
|
|
percent = totalsize_kb ? (int) ((totaldone / 1024) * 100 / totalsize_kb) : 0;
|
2011-04-10 17:42:00 +02:00
|
|
|
|
2011-08-16 16:56:47 +02:00
|
|
|
/*
|
2011-10-26 23:23:33 +02:00
|
|
|
* Avoid overflowing past 100% or the full size. This may make the total
|
|
|
|
* size number change as we approach the end of the backup (the estimate
|
|
|
|
* will always be wrong if WAL is included), but that's better than having
|
|
|
|
* the done column be bigger than the total.
|
2011-08-16 16:56:47 +02:00
|
|
|
*/
|
2011-01-30 21:30:09 +01:00
|
|
|
if (percent > 100)
|
|
|
|
percent = 100;
|
2019-09-03 11:59:36 +02:00
|
|
|
if (totaldone / 1024 > totalsize_kb)
|
|
|
|
totalsize_kb = totaldone / 1024;
|
2011-01-30 21:30:09 +01:00
|
|
|
|
2021-09-15 09:19:01 +02:00
|
|
|
snprintf(totaldone_str, sizeof(totaldone_str), UINT64_FORMAT,
|
2012-07-31 16:11:11 +02:00
|
|
|
totaldone / 1024);
|
2021-09-15 09:19:01 +02:00
|
|
|
snprintf(totalsize_str, sizeof(totalsize_str), UINT64_FORMAT, totalsize_kb);
|
2011-08-16 10:24:08 +02:00
|
|
|
|
2013-01-17 14:38:49 +01:00
|
|
|
#define VERBOSE_FILENAME_LENGTH 35
|
2011-01-23 12:21:23 +01:00
|
|
|
if (verbose)
|
2011-03-19 16:38:50 +01:00
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
if (!progress_filename)
|
2011-03-19 16:38:50 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* No filename given, so clear the status line (used for last
|
|
|
|
* call)
|
|
|
|
*/
|
|
|
|
fprintf(stderr,
|
2013-01-17 14:43:33 +01:00
|
|
|
ngettext("%*s/%s kB (100%%), %d/%d tablespace %*s",
|
|
|
|
"%*s/%s kB (100%%), %d/%d tablespaces %*s",
|
2011-08-16 10:24:08 +02:00
|
|
|
tablespacecount),
|
2013-01-17 16:10:16 +01:00
|
|
|
(int) strlen(totalsize_str),
|
2012-07-31 16:11:11 +02:00
|
|
|
totaldone_str, totalsize_str,
|
2013-01-17 14:38:49 +01:00
|
|
|
tablespacenum, tablespacecount,
|
|
|
|
VERBOSE_FILENAME_LENGTH + 5, "");
|
2011-03-19 16:38:50 +01:00
|
|
|
else
|
2013-01-17 14:38:49 +01:00
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
bool truncate = (strlen(progress_filename) > VERBOSE_FILENAME_LENGTH);
|
2013-01-17 14:38:49 +01:00
|
|
|
|
2011-03-19 16:38:50 +01:00
|
|
|
fprintf(stderr,
|
2013-01-17 14:43:33 +01:00
|
|
|
ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",
|
|
|
|
"%*s/%s kB (%d%%), %d/%d tablespaces (%s%-*.*s)",
|
2011-08-16 10:24:08 +02:00
|
|
|
tablespacecount),
|
2013-01-17 16:10:16 +01:00
|
|
|
(int) strlen(totalsize_str),
|
2012-07-31 16:11:11 +02:00
|
|
|
totaldone_str, totalsize_str, percent,
|
2013-01-17 14:38:49 +01:00
|
|
|
tablespacenum, tablespacecount,
|
|
|
|
/* Prefix with "..." if we do leading truncation */
|
|
|
|
truncate ? "..." : "",
|
|
|
|
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
|
|
|
|
truncate ? VERBOSE_FILENAME_LENGTH - 3 : VERBOSE_FILENAME_LENGTH,
|
|
|
|
/* Truncate filename at beginning if it's too long */
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
truncate ? progress_filename + strlen(progress_filename) - VERBOSE_FILENAME_LENGTH + 3 : progress_filename);
|
2013-01-17 14:38:49 +01:00
|
|
|
}
|
2011-03-19 16:38:50 +01:00
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
else
|
2011-08-16 10:24:08 +02:00
|
|
|
fprintf(stderr,
|
2013-01-17 14:43:33 +01:00
|
|
|
ngettext("%*s/%s kB (%d%%), %d/%d tablespace",
|
|
|
|
"%*s/%s kB (%d%%), %d/%d tablespaces",
|
2011-08-16 10:24:08 +02:00
|
|
|
tablespacecount),
|
2013-01-17 16:10:16 +01:00
|
|
|
(int) strlen(totalsize_str),
|
2012-07-31 16:11:11 +02:00
|
|
|
totaldone_str, totalsize_str, percent,
|
|
|
|
tablespacenum, tablespacecount);
|
2011-08-17 09:52:35 +02:00
|
|
|
|
2020-08-17 08:27:29 +02:00
|
|
|
/*
|
|
|
|
* Stay on the same line if reporting to a terminal and we're not done
|
|
|
|
* yet.
|
|
|
|
*/
|
2020-08-18 12:13:09 +02:00
|
|
|
fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
|
2014-02-27 22:55:57 +01:00
|
|
|
static int32
|
|
|
|
parse_max_rate(char *src)
|
|
|
|
{
|
|
|
|
double result;
|
|
|
|
char *after_num;
|
|
|
|
char *suffix = NULL;
|
|
|
|
|
|
|
|
errno = 0;
|
|
|
|
result = strtod(src, &after_num);
|
|
|
|
if (src == after_num)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("transfer rate \"%s\" is not a valid value", src);
|
2014-02-27 22:55:57 +01:00
|
|
|
if (errno != 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("invalid transfer rate \"%s\": %m", src);
|
2014-02-27 22:55:57 +01:00
|
|
|
|
|
|
|
if (result <= 0)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Reject obviously wrong values here.
|
|
|
|
*/
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("transfer rate must be greater than zero");
|
2014-02-27 22:55:57 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Evaluate suffix, after skipping over possible whitespace. Lack of
|
|
|
|
* suffix means kilobytes.
|
|
|
|
*/
|
|
|
|
while (*after_num != '\0' && isspace((unsigned char) *after_num))
|
|
|
|
after_num++;
|
|
|
|
|
|
|
|
if (*after_num != '\0')
|
|
|
|
{
|
|
|
|
suffix = after_num;
|
|
|
|
if (*after_num == 'k')
|
|
|
|
{
|
|
|
|
/* kilobyte is the expected unit. */
|
|
|
|
after_num++;
|
|
|
|
}
|
|
|
|
else if (*after_num == 'M')
|
|
|
|
{
|
|
|
|
after_num++;
|
|
|
|
result *= 1024.0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* The rest can only consist of white space. */
|
|
|
|
while (*after_num != '\0' && isspace((unsigned char) *after_num))
|
|
|
|
after_num++;
|
|
|
|
|
|
|
|
if (*after_num != '\0')
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("invalid --max-rate unit: \"%s\"", suffix);
|
2014-02-27 22:55:57 +01:00
|
|
|
|
|
|
|
/* Valid integer? */
|
|
|
|
if ((uint64) result != (uint64) ((uint32) result))
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("transfer rate \"%s\" exceeds integer range", src);
|
2014-02-27 22:55:57 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The range is checked on the server side too, but avoid the server
|
|
|
|
* connection if a nonsensical value was passed.
|
|
|
|
*/
|
|
|
|
if (result < MAX_RATE_LOWER || result > MAX_RATE_UPPER)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("transfer rate \"%s\" is out of range", src);
|
2014-02-27 22:55:57 +01:00
|
|
|
|
|
|
|
return (int32) result;
|
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2022-01-21 03:08:43 +01:00
|
|
|
/*
|
2022-03-23 14:19:14 +01:00
|
|
|
* Basic parsing of a value specified for -Z/--compress.
|
|
|
|
*
|
|
|
|
* We're not concerned here with understanding exactly what behavior the
|
|
|
|
* user wants, but we do need to know whether the user is requesting client
|
|
|
|
* or server side compression or leaving it unspecified, and we need to
|
|
|
|
* separate the name of the compression algorithm from the detail string.
|
|
|
|
*
|
|
|
|
* For instance, if the user writes --compress client-lz4:6, we want to
|
|
|
|
* separate that into (a) client-side compression, (b) algorithm "lz4",
|
|
|
|
* and (c) detail "6". Note, however, that all the client/server prefix is
|
|
|
|
* optional, and so is the detail. The algorithm name is required, unless
|
|
|
|
* the whole string is an integer, in which case we assume "gzip" as the
|
|
|
|
* algorithm and use the integer as the detail.
|
|
|
|
*
|
|
|
|
* We're not concerned with validation at this stage, so if the user writes
|
|
|
|
* --compress client-turkey:sandwich, the requested algorithm is "turkey"
|
|
|
|
* and the detail string is "sandwich". We'll sort out whether that's legal
|
|
|
|
* at a later stage.
|
2022-01-21 03:08:43 +01:00
|
|
|
*/
|
|
|
|
static void
|
Refactor code parsing compression option values (-Z/--compress)
This commit moves the code in charge of deparsing the method and detail
strings fed later to parse_compress_specification() to a common routine,
where the backward-compatible case of only an integer being found (N
= 0 => "none", N > 1 => gzip at level N) is handled.
Note that this has a side-effect for pg_basebackup, as we now attempt to
detect "server-" and "client-" before checking for the integer-only
pre-14 grammar, where values like server-N and client-N (without the
follow-up detail string) are now valid rather than failing because of an
unsupported method name. Past grammars are still handled the same way,
but these flavors are now authorized, and would now switch to consider N
= 0 as no compression and N > 1 as gzip with the compression level used
as N, with the caller still controlling if the compression method should
be done server-side, client-side or is unspecified. The documentation
of pg_basebackup is updated to reflect that.
This benefits other code paths that would like to rely on the same logic
as pg_basebackup and pg_receivewal with option values used for
compression specifications, one area discussed lately being pg_dump.
Author: Georgios Kokolatos, Michael Paquier
Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
2022-11-30 01:34:32 +01:00
|
|
|
backup_parse_compress_options(char *option, char **algorithm, char **detail,
|
|
|
|
CompressionLocation *locationres)
|
2022-01-21 03:08:43 +01:00
|
|
|
{
|
2022-01-28 17:40:53 +01:00
|
|
|
/*
|
Refactor code parsing compression option values (-Z/--compress)
This commit moves the code in charge of deparsing the method and detail
strings fed later to parse_compress_specification() to a common routine,
where the backward-compatible case of only an integer being found (N
= 0 => "none", N > 1 => gzip at level N) is handled.
Note that this has a side-effect for pg_basebackup, as we now attempt to
detect "server-" and "client-" before checking for the integer-only
pre-14 grammar, where values like server-N and client-N (without the
follow-up detail string) are now valid rather than failing because of an
unsupported method name. Past grammars are still handled the same way,
but these flavors are now authorized, and would now switch to consider N
= 0 as no compression and N > 1 as gzip with the compression level used
as N, with the caller still controlling if the compression method should
be done server-side, client-side or is unspecified. The documentation
of pg_basebackup is updated to reflect that.
This benefits other code paths that would like to rely on the same logic
as pg_basebackup and pg_receivewal with option values used for
compression specifications, one area discussed lately being pg_dump.
Author: Georgios Kokolatos, Michael Paquier
Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
2022-11-30 01:34:32 +01:00
|
|
|
* Strip off any "client-" or "server-" prefix, calculating the location.
|
2022-01-21 03:08:43 +01:00
|
|
|
*/
|
2022-03-23 14:19:14 +01:00
|
|
|
if (strncmp(option, "server-", 7) == 0)
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
{
|
|
|
|
*locationres = COMPRESS_LOCATION_SERVER;
|
2022-03-23 14:19:14 +01:00
|
|
|
option += 7;
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
}
|
2022-03-23 14:19:14 +01:00
|
|
|
else if (strncmp(option, "client-", 7) == 0)
|
2022-02-11 15:41:42 +01:00
|
|
|
{
|
|
|
|
*locationres = COMPRESS_LOCATION_CLIENT;
|
2022-03-23 14:19:14 +01:00
|
|
|
option += 7;
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
}
|
2022-01-21 03:08:43 +01:00
|
|
|
else
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
*locationres = COMPRESS_LOCATION_UNSPECIFIED;
|
2022-01-23 18:51:38 +01:00
|
|
|
|
Refactor code parsing compression option values (-Z/--compress)
This commit moves the code in charge of deparsing the method and detail
strings fed later to parse_compress_specification() to a common routine,
where the backward-compatible case of only an integer being found (N
= 0 => "none", N > 1 => gzip at level N) is handled.
Note that this has a side-effect for pg_basebackup, as we now attempt to
detect "server-" and "client-" before checking for the integer-only
pre-14 grammar, where values like server-N and client-N (without the
follow-up detail string) are now valid rather than failing because of an
unsupported method name. Past grammars are still handled the same way,
but these flavors are now authorized, and would now switch to consider N
= 0 as no compression and N > 1 as gzip with the compression level used
as N, with the caller still controlling if the compression method should
be done server-side, client-side or is unspecified. The documentation
of pg_basebackup is updated to reflect that.
This benefits other code paths that would like to rely on the same logic
as pg_basebackup and pg_receivewal with option values used for
compression specifications, one area discussed lately being pg_dump.
Author: Georgios Kokolatos, Michael Paquier
Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
2022-11-30 01:34:32 +01:00
|
|
|
/* fallback to the common parsing for the algorithm and detail */
|
|
|
|
parse_compress_options(option, algorithm, detail);
|
2022-01-21 03:08:43 +01:00
|
|
|
}
|
|
|
|
|
2019-12-05 21:14:09 +01:00
|
|
|
/*
|
|
|
|
* Read a stream of COPY data and invoke the provided callback for each
|
|
|
|
* chunk.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveCopyData(PGconn *conn, WriteDataCallback callback,
|
|
|
|
void *callback_data)
|
|
|
|
{
|
|
|
|
PGresult *res;
|
|
|
|
|
|
|
|
/* Get the COPY data stream. */
|
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (PQresultStatus(res) != PGRES_COPY_OUT)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not get COPY data stream: %s",
|
|
|
|
PQerrorMessage(conn));
|
2019-12-05 21:14:09 +01:00
|
|
|
PQclear(res);
|
|
|
|
|
|
|
|
/* Loop over chunks until done. */
|
|
|
|
while (1)
|
|
|
|
{
|
|
|
|
int r;
|
|
|
|
char *copybuf;
|
|
|
|
|
|
|
|
r = PQgetCopyData(conn, ©buf, 0);
|
|
|
|
if (r == -1)
|
|
|
|
{
|
|
|
|
/* End of chunk. */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
else if (r == -2)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not read COPY data: %s",
|
|
|
|
PQerrorMessage(conn));
|
2019-12-05 21:14:09 +01:00
|
|
|
|
2022-02-23 14:24:43 +01:00
|
|
|
if (bgchild_exited)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("background process terminated unexpectedly");
|
2022-02-23 14:24:43 +01:00
|
|
|
|
2019-12-05 21:14:09 +01:00
|
|
|
(*callback) (r, copybuf, callback_data);
|
|
|
|
|
|
|
|
PQfreemem(copybuf);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-01-05 16:54:06 +01:00
|
|
|
/*
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
* Figure out what to do with an archive received from the server based on
|
|
|
|
* the options selected by the user. We may just write the results directly
|
|
|
|
* to a file, or we might compress first, or we might extract the tar file
|
|
|
|
* and write each member separately. This function doesn't do any of that
|
|
|
|
* directly, but it works out what kind of bbstreamer we need to create so
|
|
|
|
* that the right stuff happens when, down the road, we actually receive
|
|
|
|
* the data.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
static bbstreamer *
|
|
|
|
CreateBackupStreamer(char *archive_name, char *spclocation,
|
|
|
|
bbstreamer **manifest_inject_streamer_p,
|
2021-11-09 20:21:35 +01:00
|
|
|
bool is_recovery_guc_supported,
|
2022-03-23 14:19:14 +01:00
|
|
|
bool expect_unterminated_tarfile,
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
pg_compress_specification *compress)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
2022-01-21 03:08:43 +01:00
|
|
|
bbstreamer *streamer = NULL;
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
bbstreamer *manifest_inject_streamer = NULL;
|
|
|
|
bool inject_manifest;
|
2022-01-28 14:41:25 +01:00
|
|
|
bool is_tar,
|
2022-02-11 15:41:42 +01:00
|
|
|
is_tar_gz,
|
2022-03-07 21:08:45 +01:00
|
|
|
is_tar_lz4,
|
2022-03-11 18:22:02 +01:00
|
|
|
is_tar_zstd,
|
|
|
|
is_compressed_tar;
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
bool must_parse_archive;
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
int archive_name_len = strlen(archive_name);
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/*
|
|
|
|
* Normally, we emit the backup manifest as a separate file, but when
|
|
|
|
* we're writing a tarfile to stdout, we don't have that option, so
|
|
|
|
* include it in the one tarfile we've got.
|
|
|
|
*/
|
|
|
|
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
/* Is this a tar archive? */
|
|
|
|
is_tar = (archive_name_len > 4 &&
|
|
|
|
strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
|
|
|
|
|
2022-03-11 18:35:13 +01:00
|
|
|
/* Is this a .tar.gz archive? */
|
|
|
|
is_tar_gz = (archive_name_len > 7 &&
|
|
|
|
strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
|
2022-01-28 14:41:25 +01:00
|
|
|
|
2022-03-11 18:35:13 +01:00
|
|
|
/* Is this a .tar.lz4 archive? */
|
2022-02-11 15:41:42 +01:00
|
|
|
is_tar_lz4 = (archive_name_len > 8 &&
|
2022-03-11 18:35:13 +01:00
|
|
|
strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
|
2022-02-11 15:41:42 +01:00
|
|
|
|
2022-03-11 18:35:13 +01:00
|
|
|
/* Is this a .tar.zst archive? */
|
2022-03-07 21:08:45 +01:00
|
|
|
is_tar_zstd = (archive_name_len > 8 &&
|
2022-03-11 18:35:13 +01:00
|
|
|
strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
|
2022-03-07 21:08:45 +01:00
|
|
|
|
2022-03-11 18:22:02 +01:00
|
|
|
/* Is this any kind of compressed tar? */
|
|
|
|
is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Injecting the manifest into a compressed tar file would be possible if
|
|
|
|
* we decompressed it, parsed the tarfile, generated a new tarfile, and
|
|
|
|
* recompressed it, but compressing and decompressing multiple times just
|
|
|
|
* to inject the manifest seems inefficient enough that it's probably not
|
|
|
|
* what the user wants. So, instead, reject the request and tell the user
|
|
|
|
* to specify something more reasonable.
|
|
|
|
*/
|
|
|
|
if (inject_manifest && is_compressed_tar)
|
|
|
|
{
|
2022-09-25 00:38:35 +02:00
|
|
|
pg_log_error("cannot inject manifest into a compressed tar file");
|
|
|
|
pg_log_error_hint("Use client-side compression, send the output to a directory rather than standard output, or use %s.",
|
|
|
|
"--no-manifest");
|
2022-03-11 18:22:02 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/*
|
|
|
|
* We have to parse the archive if (1) we're suppose to extract it, or if
|
|
|
|
* (2) we need to inject backup_manifest or recovery configuration into
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
* it. However, we only know how to parse tar archives.
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
*/
|
|
|
|
must_parse_archive = (format == 'p' || inject_manifest ||
|
|
|
|
(spclocation == NULL && writerecoveryconf));
|
2019-03-08 02:14:03 +01:00
|
|
|
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
/* At present, we only know how to parse tar archives. */
|
2022-03-11 18:22:02 +01:00
|
|
|
if (must_parse_archive && !is_tar && !is_compressed_tar)
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
{
|
2022-09-25 00:38:35 +02:00
|
|
|
pg_log_error("cannot parse archive \"%s\"", archive_name);
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_detail("Only tar archives can be parsed.");
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
if (format == 'p')
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_detail("Plain format requires pg_basebackup to parse the archive.");
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
if (inject_manifest)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_detail("Using - as the output directory requires pg_basebackup to parse the archive.");
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
if (writerecoveryconf)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_detail("The -R option requires pg_basebackup to parse the archive.");
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
if (format == 'p')
|
2012-07-31 16:11:11 +02:00
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
const char *directory;
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
/*
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
* In plain format, we must extract the archive. The data for the main
|
|
|
|
* tablespace will be written to the base directory, and the data for
|
|
|
|
* other tablespaces will be written to the directory where they're
|
|
|
|
* located on the server, after applying any user-specified tablespace
|
|
|
|
* mappings.
|
Fix pg_basebackup with in-place tablespaces some more.
Commit c6f2f01611d4f2c412e92eb7893f76fa590818e8 purported to make
this work, but problems remained. In a plain-format backup, the
files from an in-place tablespace got included in the tar file for
the main tablespace, which is wrong but it's not clear that it
has any user-visible consequences. In a tar-format backup, the
TABLESPACE_MAP option is used, and so we never iterated over
pg_tblspc and thus never backed up the in-place tablespaces
anywhere at all.
To fix this, reverse the changes in that commit, so that when we scan
pg_tblspc during a backup, we create tablespaceinfo objects even for
in-place tablespaces. We set the field that would normally contain the
absolute pathname to the relative path pg_tblspc/${TSOID}, and that's
good enough to make basebackup.c happy without any further changes.
However, pg_basebackup needs a couple of adjustments to make it work.
First, it needs to understand that a relative path for a tablespace
means it's an in-place tablespace. Second, it needs to tolerate the
situation where restoring the main tablespace tries to create
pg_tblspc or a subdirectory and finds that it already exists, because
we restore user-defined tablespaces before the main tablespace.
Since in-place tablespaces are only intended for use in development
and testing, no back-patch.
Patch by me, reviewed by Thomas Munro and Michael Paquier.
Discussion: http://postgr.es/m/CA+TgmobwvbEp+fLq2PykMYzizcvuNv0a7gPMJtxOTMOuuRLMHg@mail.gmail.com
2023-04-18 17:23:34 +02:00
|
|
|
*
|
|
|
|
* In the case of an in-place tablespace, spclocation will be a
|
|
|
|
* relative path. We just convert it to an absolute path by prepending
|
|
|
|
* basedir.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
Fix pg_basebackup with in-place tablespaces some more.
Commit c6f2f01611d4f2c412e92eb7893f76fa590818e8 purported to make
this work, but problems remained. In a plain-format backup, the
files from an in-place tablespace got included in the tar file for
the main tablespace, which is wrong but it's not clear that it
has any user-visible consequences. In a tar-format backup, the
TABLESPACE_MAP option is used, and so we never iterated over
pg_tblspc and thus never backed up the in-place tablespaces
anywhere at all.
To fix this, reverse the changes in that commit, so that when we scan
pg_tblspc during a backup, we create tablespaceinfo objects even for
in-place tablespaces. We set the field that would normally contain the
absolute pathname to the relative path pg_tblspc/${TSOID}, and that's
good enough to make basebackup.c happy without any further changes.
However, pg_basebackup needs a couple of adjustments to make it work.
First, it needs to understand that a relative path for a tablespace
means it's an in-place tablespace. Second, it needs to tolerate the
situation where restoring the main tablespace tries to create
pg_tblspc or a subdirectory and finds that it already exists, because
we restore user-defined tablespaces before the main tablespace.
Since in-place tablespaces are only intended for use in development
and testing, no back-patch.
Patch by me, reviewed by Thomas Munro and Michael Paquier.
Discussion: http://postgr.es/m/CA+TgmobwvbEp+fLq2PykMYzizcvuNv0a7gPMJtxOTMOuuRLMHg@mail.gmail.com
2023-04-18 17:23:34 +02:00
|
|
|
if (spclocation == NULL)
|
|
|
|
directory = basedir;
|
|
|
|
else if (!is_absolute_path(spclocation))
|
|
|
|
directory = psprintf("%s/%s", basedir, spclocation);
|
|
|
|
else
|
|
|
|
directory = get_tablespace_mapping(spclocation);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
streamer = bbstreamer_extractor_new(directory,
|
|
|
|
get_tablespace_mapping,
|
|
|
|
progress_update_filename);
|
2012-07-31 16:11:11 +02:00
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
else
|
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
FILE *archive_file;
|
|
|
|
char archive_filename[MAXPGPATH];
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
/*
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
* In tar format, we just write the archive without extracting it.
|
|
|
|
* Normally, we write it to the archive name provided by the caller,
|
|
|
|
* but when the base directory is "-" that means we need to write to
|
|
|
|
* standard output.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
if (strcmp(basedir, "-") == 0)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
snprintf(archive_filename, sizeof(archive_filename), "-");
|
|
|
|
archive_file = stdout;
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
snprintf(archive_filename, sizeof(archive_filename),
|
|
|
|
"%s/%s", basedir, archive_name);
|
|
|
|
archive_file = NULL;
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
if (compress->algorithm == PG_COMPRESSION_NONE)
|
2022-01-21 03:08:43 +01:00
|
|
|
streamer = bbstreamer_plain_writer_new(archive_filename,
|
|
|
|
archive_file);
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
else if (compress->algorithm == PG_COMPRESSION_GZIP)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
strlcat(archive_filename, ".gz", sizeof(archive_filename));
|
|
|
|
streamer = bbstreamer_gzip_writer_new(archive_filename,
|
2022-03-23 14:19:14 +01:00
|
|
|
archive_file, compress);
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
else if (compress->algorithm == PG_COMPRESSION_LZ4)
|
2022-02-11 15:41:42 +01:00
|
|
|
{
|
|
|
|
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
|
|
|
|
streamer = bbstreamer_plain_writer_new(archive_filename,
|
|
|
|
archive_file);
|
2022-03-23 14:19:14 +01:00
|
|
|
streamer = bbstreamer_lz4_compressor_new(streamer, compress);
|
2022-02-11 15:41:42 +01:00
|
|
|
}
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
else if (compress->algorithm == PG_COMPRESSION_ZSTD)
|
2022-03-07 21:08:45 +01:00
|
|
|
{
|
|
|
|
strlcat(archive_filename, ".zst", sizeof(archive_filename));
|
|
|
|
streamer = bbstreamer_plain_writer_new(archive_filename,
|
|
|
|
archive_file);
|
2022-03-23 14:19:14 +01:00
|
|
|
streamer = bbstreamer_zstd_compressor_new(streamer, compress);
|
2022-03-07 21:08:45 +01:00
|
|
|
}
|
2022-01-21 03:08:43 +01:00
|
|
|
else
|
|
|
|
{
|
|
|
|
Assert(false); /* not reachable */
|
|
|
|
}
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we need to parse the archive for whatever reason, then we'll
|
|
|
|
* also need to re-archive, because, if the output format is tar, the
|
|
|
|
* only point of parsing the archive is to be able to inject stuff
|
|
|
|
* into it.
|
|
|
|
*/
|
|
|
|
if (must_parse_archive)
|
|
|
|
streamer = bbstreamer_tar_archiver_new(streamer);
|
2022-02-17 21:03:40 +01:00
|
|
|
progress_update_filename(archive_filename);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
}
|
2019-12-05 21:14:09 +01:00
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
/*
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
* If we're supposed to inject the backup manifest into the results, it
|
|
|
|
* should be done here, so that the file content can be injected directly,
|
|
|
|
* without worrying about the details of the tar format.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
if (inject_manifest)
|
|
|
|
manifest_inject_streamer = streamer;
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/*
|
|
|
|
* If this is the main tablespace and we're supposed to write recovery
|
|
|
|
* information, arrange to do that.
|
|
|
|
*/
|
|
|
|
if (spclocation == NULL && writerecoveryconf)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
Assert(must_parse_archive);
|
|
|
|
streamer = bbstreamer_recovery_injector_new(streamer,
|
|
|
|
is_recovery_guc_supported,
|
|
|
|
recoveryconfcontents);
|
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/*
|
|
|
|
* If we're doing anything that involves understanding the contents of the
|
2021-11-08 22:36:06 +01:00
|
|
|
* archive, we'll need to parse it. If not, we can skip parsing it, but
|
2021-11-09 20:21:35 +01:00
|
|
|
* old versions of the server send improperly terminated tarfiles, so if
|
|
|
|
* we're talking to such a server we'll need to add the terminator here.
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
*/
|
|
|
|
if (must_parse_archive)
|
|
|
|
streamer = bbstreamer_tar_parser_new(streamer);
|
2021-11-09 20:21:35 +01:00
|
|
|
else if (expect_unterminated_tarfile)
|
2021-11-08 22:36:06 +01:00
|
|
|
streamer = bbstreamer_tar_terminator_new(streamer);
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2022-01-28 14:41:25 +01:00
|
|
|
/*
|
|
|
|
* If the user has requested a server compressed archive along with
|
|
|
|
* archive extraction at client then we need to decompress it.
|
|
|
|
*/
|
2022-03-23 14:19:14 +01:00
|
|
|
if (format == 'p')
|
2022-02-11 15:41:42 +01:00
|
|
|
{
|
2022-03-23 14:19:14 +01:00
|
|
|
if (is_tar_gz)
|
2022-02-11 15:41:42 +01:00
|
|
|
streamer = bbstreamer_gzip_decompressor_new(streamer);
|
2022-03-23 14:19:14 +01:00
|
|
|
else if (is_tar_lz4)
|
2022-02-11 15:41:42 +01:00
|
|
|
streamer = bbstreamer_lz4_decompressor_new(streamer);
|
2022-03-23 14:19:14 +01:00
|
|
|
else if (is_tar_zstd)
|
2022-03-07 21:08:45 +01:00
|
|
|
streamer = bbstreamer_zstd_decompressor_new(streamer);
|
2022-02-11 15:41:42 +01:00
|
|
|
}
|
2022-01-28 14:41:25 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/* Return the results. */
|
|
|
|
*manifest_inject_streamer_p = manifest_inject_streamer;
|
|
|
|
return streamer;
|
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
/*
|
|
|
|
* Receive all of the archives the server wants to send - and the backup
|
|
|
|
* manifest if present - as a single COPY stream.
|
|
|
|
*/
|
|
|
|
static void
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
ReceiveArchiveStream(PGconn *conn, pg_compress_specification *compress)
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
{
|
|
|
|
ArchiveStreamState state;
|
|
|
|
|
|
|
|
/* Set up initial state. */
|
|
|
|
memset(&state, 0, sizeof(state));
|
|
|
|
state.tablespacenum = -1;
|
2022-03-23 14:19:14 +01:00
|
|
|
state.compress = compress;
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
|
|
|
|
/* All the real work happens in ReceiveArchiveStreamChunk. */
|
|
|
|
ReceiveCopyData(conn, ReceiveArchiveStreamChunk, &state);
|
|
|
|
|
|
|
|
/* If we wrote the backup manifest to a file, close the file. */
|
|
|
|
if (state.manifest_file !=NULL)
|
|
|
|
{
|
|
|
|
fclose(state.manifest_file);
|
|
|
|
state.manifest_file = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we buffered the backup manifest in order to inject it into the
|
|
|
|
* output tarfile, do that now.
|
|
|
|
*/
|
|
|
|
if (state.manifest_inject_streamer != NULL &&
|
|
|
|
state.manifest_buffer != NULL)
|
|
|
|
{
|
|
|
|
bbstreamer_inject_file(state.manifest_inject_streamer,
|
|
|
|
"backup_manifest",
|
|
|
|
state.manifest_buffer->data,
|
|
|
|
state.manifest_buffer->len);
|
|
|
|
destroyPQExpBuffer(state.manifest_buffer);
|
|
|
|
state.manifest_buffer = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If there's still an archive in progress, end processing. */
|
|
|
|
if (state.streamer != NULL)
|
|
|
|
{
|
|
|
|
bbstreamer_finalize(state.streamer);
|
|
|
|
bbstreamer_free(state.streamer);
|
|
|
|
state.streamer = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Receive one chunk of data sent by the server as part of a single COPY
|
|
|
|
* stream that includes all archives and the manifest.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveArchiveStreamChunk(size_t r, char *copybuf, void *callback_data)
|
|
|
|
{
|
|
|
|
ArchiveStreamState *state = callback_data;
|
|
|
|
size_t cursor = 0;
|
|
|
|
|
|
|
|
/* Each CopyData message begins with a type byte. */
|
|
|
|
switch (GetCopyDataByte(r, copybuf, &cursor))
|
|
|
|
{
|
|
|
|
case 'n':
|
|
|
|
{
|
|
|
|
/* New archive. */
|
|
|
|
char *archive_name;
|
|
|
|
char *spclocation;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We force a progress report at the end of each tablespace. A
|
|
|
|
* new tablespace starts when the previous one ends, except in
|
|
|
|
* the case of the very first one.
|
|
|
|
*/
|
|
|
|
if (++state->tablespacenum > 0)
|
|
|
|
progress_report(state->tablespacenum, true, false);
|
|
|
|
|
|
|
|
/* Sanity check. */
|
|
|
|
if (state->manifest_buffer != NULL ||
|
|
|
|
state->manifest_file !=NULL)
|
2022-09-25 00:38:35 +02:00
|
|
|
pg_fatal("archives must precede manifest");
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
|
|
|
|
/* Parse the rest of the CopyData message. */
|
|
|
|
archive_name = GetCopyDataString(r, copybuf, &cursor);
|
|
|
|
spclocation = GetCopyDataString(r, copybuf, &cursor);
|
|
|
|
GetCopyDataEnd(r, copybuf, cursor);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Basic sanity checks on the archive name: it shouldn't be
|
|
|
|
* empty, it shouldn't start with a dot, and it shouldn't
|
|
|
|
* contain a path separator.
|
|
|
|
*/
|
|
|
|
if (archive_name[0] == '\0' || archive_name[0] == '.' ||
|
|
|
|
strchr(archive_name, '/') != NULL ||
|
|
|
|
strchr(archive_name, '\\') != NULL)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("invalid archive name: \"%s\"",
|
|
|
|
archive_name);
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* An empty spclocation is treated as NULL. We expect this
|
|
|
|
* case to occur for the data directory itself, but not for
|
|
|
|
* any archives that correspond to tablespaces.
|
|
|
|
*/
|
|
|
|
if (spclocation[0] == '\0')
|
|
|
|
spclocation = NULL;
|
|
|
|
|
|
|
|
/* End processing of any prior archive. */
|
|
|
|
if (state->streamer != NULL)
|
|
|
|
{
|
|
|
|
bbstreamer_finalize(state->streamer);
|
|
|
|
bbstreamer_free(state->streamer);
|
|
|
|
state->streamer = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
* Create an appropriate backup streamer, unless a backup
|
|
|
|
* target was specified. In that case, it's up to the server
|
|
|
|
* to put the backup wherever it needs to go.
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (backup_target == NULL)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We know that recovery GUCs are supported, because this
|
|
|
|
* protocol can only be used on v15+.
|
|
|
|
*/
|
|
|
|
state->streamer =
|
|
|
|
CreateBackupStreamer(archive_name,
|
|
|
|
spclocation,
|
|
|
|
&state->manifest_inject_streamer,
|
2022-03-23 14:19:14 +01:00
|
|
|
true, false,
|
|
|
|
state->compress);
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
}
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
case 'd':
|
|
|
|
{
|
|
|
|
/* Archive or manifest data. */
|
|
|
|
if (state->manifest_buffer != NULL)
|
|
|
|
{
|
|
|
|
/* Manifest data, buffer in memory. */
|
|
|
|
appendPQExpBuffer(state->manifest_buffer, copybuf + 1,
|
|
|
|
r - 1);
|
|
|
|
}
|
|
|
|
else if (state->manifest_file !=NULL)
|
|
|
|
{
|
|
|
|
/* Manifest data, write to disk. */
|
|
|
|
if (fwrite(copybuf + 1, r - 1, 1,
|
|
|
|
state->manifest_file) != 1)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If fwrite() didn't set errno, assume that the
|
|
|
|
* problem is that we're out of disk space.
|
|
|
|
*/
|
|
|
|
if (errno == 0)
|
|
|
|
errno = ENOSPC;
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not write to file \"%s\": %m",
|
|
|
|
state->manifest_filename);
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
else if (state->streamer != NULL)
|
|
|
|
{
|
|
|
|
/* Archive data. */
|
|
|
|
bbstreamer_content(state->streamer, NULL, copybuf + 1,
|
|
|
|
r - 1, BBSTREAMER_UNKNOWN);
|
|
|
|
}
|
|
|
|
else
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("unexpected payload data");
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
case 'p':
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Progress report.
|
|
|
|
*
|
|
|
|
* The remainder of the message is expected to be an 8-byte
|
|
|
|
* count of bytes completed.
|
|
|
|
*/
|
|
|
|
totaldone = GetCopyDataUInt64(r, copybuf, &cursor);
|
|
|
|
GetCopyDataEnd(r, copybuf, cursor);
|
|
|
|
|
|
|
|
/*
|
2022-04-11 10:49:41 +02:00
|
|
|
* The server shouldn't send progress report messages too
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
* often, so we force an update each time we receive one.
|
|
|
|
*/
|
|
|
|
progress_report(state->tablespacenum, true, false);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
case 'm':
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Manifest data will be sent next. This message is not
|
|
|
|
* expected to have any further payload data.
|
|
|
|
*/
|
|
|
|
GetCopyDataEnd(r, copybuf, cursor);
|
|
|
|
|
|
|
|
/*
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
* If a backup target was specified, figuring out where to put
|
|
|
|
* the manifest is the server's problem. Otherwise, we need to
|
|
|
|
* deal with it.
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (backup_target == NULL)
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
{
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
/*
|
|
|
|
* If we're supposed inject the manifest into the archive,
|
|
|
|
* we prepare to buffer it in memory; otherwise, we
|
|
|
|
* prepare to write it to a temporary file.
|
|
|
|
*/
|
|
|
|
if (state->manifest_inject_streamer != NULL)
|
|
|
|
state->manifest_buffer = createPQExpBuffer();
|
|
|
|
else
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
{
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
snprintf(state->manifest_filename,
|
|
|
|
sizeof(state->manifest_filename),
|
|
|
|
"%s/backup_manifest.tmp", basedir);
|
|
|
|
state->manifest_file =
|
|
|
|
fopen(state->manifest_filename, "wb");
|
|
|
|
if (state->manifest_file == NULL)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create file \"%s\": %m",
|
|
|
|
state->manifest_filename);
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
default:
|
|
|
|
ReportCopyDataParseError(r, copybuf);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a single byte from a CopyData message.
|
|
|
|
*
|
|
|
|
* Bail out if none remain.
|
|
|
|
*/
|
|
|
|
static char
|
|
|
|
GetCopyDataByte(size_t r, char *copybuf, size_t *cursor)
|
|
|
|
{
|
|
|
|
if (*cursor >= r)
|
|
|
|
ReportCopyDataParseError(r, copybuf);
|
|
|
|
|
|
|
|
return copybuf[(*cursor)++];
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a NUL-terminated string from a CopyData message.
|
|
|
|
*
|
|
|
|
* Bail out if the terminating NUL cannot be found.
|
|
|
|
*/
|
|
|
|
static char *
|
|
|
|
GetCopyDataString(size_t r, char *copybuf, size_t *cursor)
|
|
|
|
{
|
|
|
|
size_t startpos = *cursor;
|
|
|
|
size_t endpos = startpos;
|
|
|
|
|
|
|
|
while (1)
|
|
|
|
{
|
|
|
|
if (endpos >= r)
|
|
|
|
ReportCopyDataParseError(r, copybuf);
|
|
|
|
if (copybuf[endpos] == '\0')
|
|
|
|
break;
|
|
|
|
++endpos;
|
|
|
|
}
|
|
|
|
|
|
|
|
*cursor = endpos + 1;
|
|
|
|
return ©buf[startpos];
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get an unsigned 64-bit integer from a CopyData message.
|
|
|
|
*
|
|
|
|
* Bail out if there are not at least 8 bytes remaining.
|
|
|
|
*/
|
|
|
|
static uint64
|
|
|
|
GetCopyDataUInt64(size_t r, char *copybuf, size_t *cursor)
|
|
|
|
{
|
|
|
|
uint64 result;
|
|
|
|
|
|
|
|
if (*cursor + sizeof(uint64) > r)
|
|
|
|
ReportCopyDataParseError(r, copybuf);
|
|
|
|
memcpy(&result, ©buf[*cursor], sizeof(uint64));
|
|
|
|
*cursor += sizeof(uint64);
|
|
|
|
return pg_ntoh64(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Bail out if we didn't parse the whole message.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
GetCopyDataEnd(size_t r, char *copybuf, size_t cursor)
|
|
|
|
{
|
|
|
|
if (r != cursor)
|
|
|
|
ReportCopyDataParseError(r, copybuf);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Report failure to parse a CopyData message from the server. Then exit.
|
|
|
|
*
|
|
|
|
* As a debugging aid, we try to give some hint about what kind of message
|
|
|
|
* provoked the failure. Perhaps this is not detailed enough, but it's not
|
2022-03-15 03:29:23 +01:00
|
|
|
* clear that it's worth expending any more code on what should be a
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
* can't-happen case.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReportCopyDataParseError(size_t r, char *copybuf)
|
|
|
|
{
|
|
|
|
if (r == 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("empty COPY message");
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
else
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("malformed COPY message of type %d, length %zu",
|
|
|
|
copybuf[0], r);
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
}
|
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/*
|
|
|
|
* Receive raw tar data from the server, and stream it to the appropriate
|
|
|
|
* location. If we're writing a single tarfile to standard output, also
|
|
|
|
* receive the backup manifest and inject it into that tarfile.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveTarFile(PGconn *conn, char *archive_name, char *spclocation,
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
bool tablespacenum, pg_compress_specification *compress)
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
{
|
|
|
|
WriteTarState state;
|
|
|
|
bbstreamer *manifest_inject_streamer;
|
|
|
|
bool is_recovery_guc_supported;
|
2021-11-09 20:21:35 +01:00
|
|
|
bool expect_unterminated_tarfile;
|
2020-04-27 19:04:35 +02:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/* Pass all COPY data through to the backup streamer. */
|
|
|
|
memset(&state, 0, sizeof(state));
|
|
|
|
is_recovery_guc_supported =
|
|
|
|
PQserverVersion(conn) >= MINIMUM_VERSION_FOR_RECOVERY_GUC;
|
2021-11-09 20:21:35 +01:00
|
|
|
expect_unterminated_tarfile =
|
|
|
|
PQserverVersion(conn) < MINIMUM_VERSION_FOR_TERMINATED_TARFILE;
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
state.streamer = CreateBackupStreamer(archive_name, spclocation,
|
|
|
|
&manifest_inject_streamer,
|
2021-11-09 20:21:35 +01:00
|
|
|
is_recovery_guc_supported,
|
2022-03-23 14:19:14 +01:00
|
|
|
expect_unterminated_tarfile,
|
|
|
|
compress);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
state.tablespacenum = tablespacenum;
|
|
|
|
ReceiveCopyData(conn, ReceiveTarCopyChunk, &state);
|
2022-02-17 21:03:40 +01:00
|
|
|
progress_update_filename(NULL);
|
2013-01-05 16:54:06 +01:00
|
|
|
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
/*
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
* The decision as to whether we need to inject the backup manifest into
|
|
|
|
* the output at this stage is made by CreateBackupStreamer; if that is
|
|
|
|
* needed, manifest_inject_streamer will be non-NULL; otherwise, it will
|
|
|
|
* be NULL.
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
*/
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
if (manifest_inject_streamer != NULL)
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
{
|
|
|
|
PQExpBufferData buf;
|
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/* Slurp the entire backup manifest into a buffer. */
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
initPQExpBuffer(&buf);
|
|
|
|
ReceiveBackupManifestInMemory(conn, &buf);
|
|
|
|
if (PQExpBufferDataBroken(buf))
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("out of memory");
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/* Inject it into the output tarfile. */
|
|
|
|
bbstreamer_inject_file(manifest_inject_streamer, "backup_manifest",
|
|
|
|
buf.data, buf.len);
|
2019-12-05 21:14:09 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/* Free memory. */
|
|
|
|
termPQExpBuffer(&buf);
|
2019-12-05 21:14:09 +01:00
|
|
|
}
|
2013-01-05 16:54:06 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
/* Cleanup. */
|
|
|
|
bbstreamer_finalize(state.streamer);
|
|
|
|
bbstreamer_free(state.streamer);
|
|
|
|
|
|
|
|
progress_report(tablespacenum, true, false);
|
2018-11-25 16:31:16 +01:00
|
|
|
|
2019-12-05 21:14:09 +01:00
|
|
|
/*
|
|
|
|
* Do not sync the resulting tar file yet, all files are synced once at
|
|
|
|
* the end.
|
|
|
|
*/
|
|
|
|
}
|
2018-11-25 16:31:16 +01:00
|
|
|
|
2019-12-05 21:14:09 +01:00
|
|
|
/*
|
|
|
|
* Receive one chunk of tar-format data from the server.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveTarCopyChunk(size_t r, char *copybuf, void *callback_data)
|
|
|
|
{
|
|
|
|
WriteTarState *state = callback_data;
|
2018-11-25 16:31:16 +01:00
|
|
|
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
bbstreamer_content(state->streamer, NULL, copybuf, r, BBSTREAMER_UNKNOWN);
|
2018-11-25 16:31:16 +01:00
|
|
|
|
2019-12-05 21:14:09 +01:00
|
|
|
totaldone += r;
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
progress_report(state->tablespacenum, false, false);
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
|
2014-02-22 19:38:06 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Retrieve tablespace path, either relocated or original depending on whether
|
|
|
|
* -T was passed or not.
|
|
|
|
*/
|
|
|
|
static const char *
|
|
|
|
get_tablespace_mapping(const char *dir)
|
|
|
|
{
|
|
|
|
TablespaceListCell *cell;
|
2017-11-01 15:20:05 +01:00
|
|
|
char canon_dir[MAXPGPATH];
|
|
|
|
|
|
|
|
/* Canonicalize path for comparison consistency */
|
|
|
|
strlcpy(canon_dir, dir, sizeof(canon_dir));
|
|
|
|
canonicalize_path(canon_dir);
|
2014-02-22 19:38:06 +01:00
|
|
|
|
|
|
|
for (cell = tablespace_dirs.head; cell; cell = cell->next)
|
2017-11-01 15:20:05 +01:00
|
|
|
if (strcmp(canon_dir, cell->old_dir) == 0)
|
2014-02-22 19:38:06 +01:00
|
|
|
return cell->new_dir;
|
|
|
|
|
|
|
|
return dir;
|
|
|
|
}
|
|
|
|
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
/*
|
|
|
|
* Receive the backup manifest file and write it out to a file.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveBackupManifest(PGconn *conn)
|
|
|
|
{
|
|
|
|
WriteManifestState state;
|
|
|
|
|
|
|
|
snprintf(state.filename, sizeof(state.filename),
|
|
|
|
"%s/backup_manifest.tmp", basedir);
|
|
|
|
state.file = fopen(state.filename, "wb");
|
|
|
|
if (state.file == NULL)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create file \"%s\": %m", state.filename);
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
|
|
|
|
ReceiveCopyData(conn, ReceiveBackupManifestChunk, &state);
|
|
|
|
|
|
|
|
fclose(state.file);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Receive one chunk of the backup manifest file and write it out to a file.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
|
|
|
|
{
|
|
|
|
WriteManifestState *state = callback_data;
|
|
|
|
|
2020-06-19 22:46:07 +02:00
|
|
|
errno = 0;
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
if (fwrite(copybuf, r, 1, state->file) != 1)
|
|
|
|
{
|
2020-06-19 22:46:07 +02:00
|
|
|
/* if write didn't set errno, assume problem is no disk space */
|
|
|
|
if (errno == 0)
|
|
|
|
errno = ENOSPC;
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not write to file \"%s\": %m", state->filename);
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Receive the backup manifest file and write it out to a file.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveBackupManifestInMemory(PGconn *conn, PQExpBuffer buf)
|
|
|
|
{
|
|
|
|
ReceiveCopyData(conn, ReceiveBackupManifestInMemoryChunk, buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Receive one chunk of the backup manifest file and write it out to a file.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ReceiveBackupManifestInMemoryChunk(size_t r, char *copybuf,
|
|
|
|
void *callback_data)
|
|
|
|
{
|
|
|
|
PQExpBuffer buf = callback_data;
|
|
|
|
|
|
|
|
appendPQExpBuffer(buf, copybuf, r);
|
|
|
|
}
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
static void
|
2022-03-23 14:19:14 +01:00
|
|
|
BaseBackup(char *compression_algorithm, char *compression_detail,
|
2023-12-20 15:49:12 +01:00
|
|
|
CompressionLocation compressloc,
|
|
|
|
pg_compress_specification *client_compress,
|
|
|
|
char *incremental_manifest)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
|
|
|
PGresult *res;
|
2011-10-26 20:13:33 +02:00
|
|
|
char *sysidentifier;
|
2014-10-01 17:22:21 +02:00
|
|
|
TimeLineID latesttli;
|
|
|
|
TimeLineID starttli;
|
2014-02-27 22:55:57 +01:00
|
|
|
char *basebkp;
|
2011-01-23 12:21:23 +01:00
|
|
|
int i;
|
2011-02-03 13:46:23 +01:00
|
|
|
char xlogstart[64];
|
2022-07-16 08:42:15 +02:00
|
|
|
char xlogend[64] = {0};
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
int minServerMajor,
|
|
|
|
maxServerMajor;
|
2016-10-20 17:24:37 +02:00
|
|
|
int serverVersion,
|
|
|
|
serverMajor;
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
int writing_to_stdout;
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
bool use_new_option_syntax = false;
|
|
|
|
PQExpBufferData buf;
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2016-10-20 17:24:37 +02:00
|
|
|
Assert(conn != NULL);
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
initPQExpBuffer(&buf);
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
/*
|
|
|
|
* Check server version. BASE_BACKUP command was introduced in 9.1, so we
|
|
|
|
* can't work with servers older than 9.1.
|
|
|
|
*/
|
|
|
|
minServerMajor = 901;
|
|
|
|
maxServerMajor = PG_VERSION_NUM / 100;
|
2016-10-20 17:24:37 +02:00
|
|
|
serverVersion = PQserverVersion(conn);
|
|
|
|
serverMajor = serverVersion / 100;
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
if (serverMajor < minServerMajor || serverMajor > maxServerMajor)
|
|
|
|
{
|
|
|
|
const char *serverver = PQparameterStatus(conn, "server_version");
|
2013-05-29 22:58:43 +02:00
|
|
|
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("incompatible server version %s",
|
|
|
|
serverver ? serverver : "'unknown'");
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
}
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
if (serverMajor >= 1500)
|
|
|
|
use_new_option_syntax = true;
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If WAL streaming was requested, also check that the server is new
|
|
|
|
* enough for that.
|
|
|
|
*/
|
2017-01-09 16:03:47 +01:00
|
|
|
if (includewal == STREAM_WAL && !CheckServerVersionForStreaming(conn))
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
{
|
2017-01-04 10:40:38 +01:00
|
|
|
/*
|
|
|
|
* Error message already written in CheckServerVersionForStreaming(),
|
|
|
|
* but add a hint about using -X none.
|
|
|
|
*/
|
2022-09-25 00:38:35 +02:00
|
|
|
pg_log_error_hint("Use -X none or -X fetch to disable log streaming.");
|
2018-12-29 13:21:47 +01:00
|
|
|
exit(1);
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
}
|
|
|
|
|
2013-01-05 16:54:06 +01:00
|
|
|
/*
|
2024-03-21 06:18:59 +01:00
|
|
|
* Build contents of configuration file if requested.
|
|
|
|
*
|
|
|
|
* Note that we don't use the dbname from key-value pair in conn as that
|
|
|
|
* would have been filled by the default dbname (dbname=replication) in
|
|
|
|
* case the user didn't specify the one. The dbname written in the config
|
|
|
|
* file as part of primary_conninfo would be used by slotsync worker which
|
|
|
|
* doesn't use a replication connection so the default won't work for it.
|
2013-01-05 16:54:06 +01:00
|
|
|
*/
|
|
|
|
if (writerecoveryconf)
|
2024-03-21 06:18:59 +01:00
|
|
|
recoveryconfcontents = GenerateRecoveryConfig(conn,
|
|
|
|
replication_slot,
|
|
|
|
GetDbnameFromConnectionOptions());
|
2013-01-05 16:54:06 +01:00
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
/*
|
|
|
|
* Run IDENTIFY_SYSTEM so we can get the timeline
|
|
|
|
*/
|
2014-10-01 17:22:21 +02:00
|
|
|
if (!RunIdentifySystem(conn, &sysidentifier, &latesttli, NULL, NULL))
|
2018-12-29 13:21:47 +01:00
|
|
|
exit(1);
|
2011-10-26 20:13:33 +02:00
|
|
|
|
2011-02-03 13:46:23 +01:00
|
|
|
/*
|
2023-12-20 15:49:12 +01:00
|
|
|
* If the user wants an incremental backup, we must upload the manifest
|
|
|
|
* for the previous backup upon which it is to be based.
|
|
|
|
*/
|
|
|
|
if (incremental_manifest != NULL)
|
|
|
|
{
|
|
|
|
int fd;
|
|
|
|
char mbuf[65536];
|
|
|
|
int nbytes;
|
|
|
|
|
|
|
|
/* Reject if server is too old. */
|
|
|
|
if (serverVersion < MINIMUM_VERSION_FOR_WAL_SUMMARIES)
|
|
|
|
pg_fatal("server does not support incremental backup");
|
|
|
|
|
|
|
|
/* Open the file. */
|
|
|
|
fd = open(incremental_manifest, O_RDONLY | PG_BINARY, 0);
|
|
|
|
if (fd < 0)
|
|
|
|
pg_fatal("could not open file \"%s\": %m", incremental_manifest);
|
|
|
|
|
|
|
|
/* Tell the server what we want to do. */
|
|
|
|
if (PQsendQuery(conn, "UPLOAD_MANIFEST") == 0)
|
|
|
|
pg_fatal("could not send replication command \"%s\": %s",
|
|
|
|
"UPLOAD_MANIFEST", PQerrorMessage(conn));
|
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (PQresultStatus(res) != PGRES_COPY_IN)
|
|
|
|
{
|
|
|
|
if (PQresultStatus(res) == PGRES_FATAL_ERROR)
|
|
|
|
pg_fatal("could not upload manifest: %s",
|
|
|
|
PQerrorMessage(conn));
|
|
|
|
else
|
|
|
|
pg_fatal("could not upload manifest: unexpected status %s",
|
|
|
|
PQresStatus(PQresultStatus(res)));
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Loop, reading from the file and sending the data to the server. */
|
|
|
|
while ((nbytes = read(fd, mbuf, sizeof mbuf)) > 0)
|
|
|
|
{
|
|
|
|
if (PQputCopyData(conn, mbuf, nbytes) < 0)
|
|
|
|
pg_fatal("could not send COPY data: %s",
|
|
|
|
PQerrorMessage(conn));
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Bail out if we exited the loop due to an error. */
|
|
|
|
if (nbytes < 0)
|
|
|
|
pg_fatal("could not read file \"%s\": %m", incremental_manifest);
|
|
|
|
|
|
|
|
/* End the COPY operation. */
|
|
|
|
if (PQputCopyEnd(conn, NULL) < 0)
|
|
|
|
pg_fatal("could not send end-of-COPY: %s",
|
|
|
|
PQerrorMessage(conn));
|
|
|
|
|
|
|
|
/* See whether the server is happy with what we sent. */
|
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (PQresultStatus(res) == PGRES_FATAL_ERROR)
|
|
|
|
pg_fatal("could not upload manifest: %s",
|
|
|
|
PQerrorMessage(conn));
|
|
|
|
else if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
|
|
|
pg_fatal("could not upload manifest: unexpected status %s",
|
|
|
|
PQresStatus(PQresultStatus(res)));
|
|
|
|
|
|
|
|
/* Consume ReadyForQuery message from server. */
|
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (res != NULL)
|
|
|
|
pg_fatal("unexpected extra result while sending manifest");
|
|
|
|
|
|
|
|
/* Add INCREMENTAL option to BASE_BACKUP command. */
|
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax, "INCREMENTAL");
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Continue building up the options list for the BASE_BACKUP command.
|
2011-02-03 13:46:23 +01:00
|
|
|
*/
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
|
|
|
|
if (estimatesize)
|
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax, "PROGRESS");
|
|
|
|
if (includewal == FETCH_WAL)
|
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax, "WAL");
|
|
|
|
if (fastcheckpoint)
|
|
|
|
{
|
|
|
|
if (use_new_option_syntax)
|
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"CHECKPOINT", "fast");
|
|
|
|
else
|
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax, "FAST");
|
|
|
|
}
|
|
|
|
if (includewal != NO_WAL)
|
|
|
|
{
|
|
|
|
if (use_new_option_syntax)
|
|
|
|
AppendIntegerCommandOption(&buf, use_new_option_syntax, "WAIT", 0);
|
|
|
|
else
|
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax, "NOWAIT");
|
|
|
|
}
|
2014-02-27 22:55:57 +01:00
|
|
|
if (maxrate > 0)
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
AppendIntegerCommandOption(&buf, use_new_option_syntax, "MAX_RATE",
|
|
|
|
maxrate);
|
|
|
|
if (format == 't')
|
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
|
|
|
|
if (!verify_checksums)
|
|
|
|
{
|
|
|
|
if (use_new_option_syntax)
|
|
|
|
AppendIntegerCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"VERIFY_CHECKSUMS", 0);
|
|
|
|
else
|
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"NOVERIFY_CHECKSUMS");
|
|
|
|
}
|
2014-02-27 22:55:57 +01:00
|
|
|
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
if (manifest)
|
|
|
|
{
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax, "MANIFEST",
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
manifest_force_encode ? "force-encode" : "yes");
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
if (manifest_checksums != NULL)
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
"MANIFEST_CHECKSUMS", manifest_checksums);
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
}
|
|
|
|
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (backup_target != NULL)
|
|
|
|
{
|
|
|
|
char *colon;
|
|
|
|
|
|
|
|
if (serverMajor < 1500)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("backup targets are not supported by this server version");
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
|
2022-01-27 15:46:17 +01:00
|
|
|
if (writerecoveryconf)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("recovery configuration cannot be written when a backup target is used");
|
2022-01-27 15:46:17 +01:00
|
|
|
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
AppendPlainCommandOption(&buf, use_new_option_syntax, "TABLESPACE_MAP");
|
|
|
|
|
|
|
|
if ((colon = strchr(backup_target, ':')) == NULL)
|
|
|
|
{
|
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"TARGET", backup_target);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
char *target;
|
|
|
|
|
|
|
|
target = pnstrdup(backup_target, colon - backup_target);
|
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"TARGET", target);
|
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"TARGET_DETAIL", colon + 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else if (serverMajor >= 1500)
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"TARGET", "client");
|
|
|
|
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
if (compressloc == COMPRESS_LOCATION_SERVER)
|
|
|
|
{
|
|
|
|
if (!use_new_option_syntax)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("server does not support server-side compression");
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
2022-03-23 14:19:14 +01:00
|
|
|
"COMPRESSION", compression_algorithm);
|
|
|
|
if (compression_detail != NULL)
|
|
|
|
AppendStringCommandOption(&buf, use_new_option_syntax,
|
|
|
|
"COMPRESSION_DETAIL",
|
|
|
|
compression_detail);
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
}
|
|
|
|
|
2017-02-26 21:27:51 +01:00
|
|
|
if (verbose)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("initiating base backup, waiting for checkpoint to complete");
|
2017-02-26 21:27:51 +01:00
|
|
|
|
|
|
|
if (showprogress && !verbose)
|
2017-12-01 15:21:34 +01:00
|
|
|
{
|
2022-09-25 00:38:35 +02:00
|
|
|
fprintf(stderr, _("waiting for checkpoint"));
|
2017-12-01 15:21:34 +01:00
|
|
|
if (isatty(fileno(stderr)))
|
|
|
|
fprintf(stderr, "\r");
|
|
|
|
else
|
|
|
|
fprintf(stderr, "\n");
|
|
|
|
}
|
2017-02-26 21:27:51 +01:00
|
|
|
|
Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.
In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.
This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.
Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.
Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 17:50:21 +02:00
|
|
|
if (use_new_option_syntax && buf.len > 0)
|
|
|
|
basebkp = psprintf("BASE_BACKUP (%s)", buf.data);
|
|
|
|
else
|
|
|
|
basebkp = psprintf("BASE_BACKUP %s", buf.data);
|
2014-02-27 22:55:57 +01:00
|
|
|
|
2023-12-20 15:49:12 +01:00
|
|
|
/* OK, try to start the backup. */
|
2014-02-27 22:55:57 +01:00
|
|
|
if (PQsendQuery(conn, basebkp) == 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not send replication command \"%s\": %s",
|
|
|
|
"BASE_BACKUP", PQerrorMessage(conn));
|
2011-01-23 12:21:23 +01:00
|
|
|
|
|
|
|
/*
|
2017-05-12 19:51:27 +02:00
|
|
|
* Get the starting WAL location
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (PQresultStatus(res) != PGRES_TUPLES_OK)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not initiate base backup: %s",
|
|
|
|
PQerrorMessage(conn));
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
if (PQntuples(res) != 1)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("server returned unexpected response to BASE_BACKUP command; got %d rows and %d fields, expected %d rows and %d fields",
|
|
|
|
PQntuples(res), PQnfields(res), 1, 2);
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
|
2014-02-17 17:20:21 +01:00
|
|
|
strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
|
2013-05-29 22:58:43 +02:00
|
|
|
|
2017-02-26 21:27:51 +01:00
|
|
|
if (verbose)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("checkpoint completed");
|
2017-02-26 21:27:51 +01:00
|
|
|
|
Make pg_basebackup work with pre-9.3 servers, and add server version check.
A new 'starttli' field was added to the response of BASE_BACKUP command.
Make pg_basebackup tolerate the case that it's missing, so that it still
works with older servers.
Add an explicit check for the server version, so that you get a nicer error
message if you try to use it with a pre-9.1 server.
The streaming protocol message format changed in 9.3, so -X stream still won't
work with pre-9.3 servers. I added a version check to ReceiveXLogStream()
earlier, but write that slightly differently, so that in 9.4, it will still
work with a 9.3 server. (In 9.4, the error message needs to be adjusted to
"9.3 or above", though). Also, if the version check fails, don't retry.
2013-03-22 12:02:59 +01:00
|
|
|
/*
|
|
|
|
* 9.3 and later sends the TLI of the starting point. With older servers,
|
|
|
|
* assume it's the same as the latest timeline reported by
|
|
|
|
* IDENTIFY_SYSTEM.
|
|
|
|
*/
|
|
|
|
if (PQnfields(res) >= 2)
|
|
|
|
starttli = atoi(PQgetvalue(res, 0, 1));
|
|
|
|
else
|
|
|
|
starttli = latesttli;
|
2011-02-03 13:46:23 +01:00
|
|
|
PQclear(res);
|
|
|
|
|
2017-01-09 16:03:47 +01:00
|
|
|
if (verbose && includewal != NO_WAL)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("write-ahead log start point: %s on timeline %u",
|
|
|
|
xlogstart, starttli);
|
Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.
When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.
This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 19:23:00 +01:00
|
|
|
|
2011-02-03 13:46:23 +01:00
|
|
|
/*
|
|
|
|
* Get the header
|
|
|
|
*/
|
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (PQresultStatus(res) != PGRES_TUPLES_OK)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not get backup header: %s",
|
|
|
|
PQerrorMessage(conn));
|
2011-01-23 12:21:23 +01:00
|
|
|
if (PQntuples(res) < 1)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("no data returned from server");
|
2011-01-23 12:21:23 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Sum up the total size, for progress reporting
|
|
|
|
*/
|
2019-09-03 11:59:36 +02:00
|
|
|
totalsize_kb = totaldone = 0;
|
2011-01-23 12:21:23 +01:00
|
|
|
tablespacecount = PQntuples(res);
|
|
|
|
for (i = 0; i < PQntuples(res); i++)
|
|
|
|
{
|
2019-09-03 11:59:36 +02:00
|
|
|
totalsize_kb += atol(PQgetvalue(res, i, 2));
|
2011-01-23 12:21:23 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Verify tablespace directories are empty. Don't bother with the
|
|
|
|
* first once since it can be relocated, and it will be checked before
|
|
|
|
* we do anything anyway.
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
*
|
|
|
|
* Note that this is skipped for tar format backups and backups that
|
|
|
|
* the server is storing to a target location, since in that case we
|
|
|
|
* won't be storing anything into these directories and thus should
|
|
|
|
* not create them.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (backup_target == NULL && format == 'p' && !PQgetisnull(res, i, 1))
|
2014-02-22 19:38:06 +01:00
|
|
|
{
|
Fix pg_basebackup with in-place tablespaces some more.
Commit c6f2f01611d4f2c412e92eb7893f76fa590818e8 purported to make
this work, but problems remained. In a plain-format backup, the
files from an in-place tablespace got included in the tar file for
the main tablespace, which is wrong but it's not clear that it
has any user-visible consequences. In a tar-format backup, the
TABLESPACE_MAP option is used, and so we never iterated over
pg_tblspc and thus never backed up the in-place tablespaces
anywhere at all.
To fix this, reverse the changes in that commit, so that when we scan
pg_tblspc during a backup, we create tablespaceinfo objects even for
in-place tablespaces. We set the field that would normally contain the
absolute pathname to the relative path pg_tblspc/${TSOID}, and that's
good enough to make basebackup.c happy without any further changes.
However, pg_basebackup needs a couple of adjustments to make it work.
First, it needs to understand that a relative path for a tablespace
means it's an in-place tablespace. Second, it needs to tolerate the
situation where restoring the main tablespace tries to create
pg_tblspc or a subdirectory and finds that it already exists, because
we restore user-defined tablespaces before the main tablespace.
Since in-place tablespaces are only intended for use in development
and testing, no back-patch.
Patch by me, reviewed by Thomas Munro and Michael Paquier.
Discussion: http://postgr.es/m/CA+TgmobwvbEp+fLq2PykMYzizcvuNv0a7gPMJtxOTMOuuRLMHg@mail.gmail.com
2023-04-18 17:23:34 +02:00
|
|
|
char *path = PQgetvalue(res, i, 1);
|
|
|
|
|
|
|
|
if (is_absolute_path(path))
|
|
|
|
path = unconstify(char *, get_tablespace_mapping(path));
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* This is an in-place tablespace, so prepend basedir. */
|
|
|
|
path = psprintf("%s/%s", basedir, path);
|
|
|
|
}
|
2014-05-06 18:12:18 +02:00
|
|
|
|
2016-09-12 18:00:00 +02:00
|
|
|
verify_dir_is_empty_or_create(path, &made_tablespace_dirs, &found_tablespace_dirs);
|
2014-02-22 19:38:06 +01:00
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When writing to stdout, require a single tablespace
|
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
writing_to_stdout = format == 't' && basedir != NULL &&
|
|
|
|
strcmp(basedir, "-") == 0;
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
if (writing_to_stdout && PQntuples(res) > 1)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("can only write single tablespace to stdout, database has %d",
|
|
|
|
PQntuples(res));
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
/*
|
|
|
|
* If we're streaming WAL, start the streaming session before we start
|
|
|
|
* receiving the actual data chunks.
|
|
|
|
*/
|
2017-01-09 16:03:47 +01:00
|
|
|
if (includewal == STREAM_WAL)
|
2011-10-26 20:13:33 +02:00
|
|
|
{
|
2022-04-12 10:28:17 +02:00
|
|
|
pg_compress_algorithm wal_compress_algorithm;
|
2022-03-23 14:19:14 +01:00
|
|
|
int wal_compress_level;
|
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
if (verbose)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("starting background WAL receiver");
|
2022-03-23 14:19:14 +01:00
|
|
|
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
if (client_compress->algorithm == PG_COMPRESSION_GZIP)
|
2022-03-23 14:19:14 +01:00
|
|
|
{
|
2022-04-12 10:28:17 +02:00
|
|
|
wal_compress_algorithm = PG_COMPRESSION_GZIP;
|
Simplify handling of compression level with compression specifications
PG_COMPRESSION_OPTION_LEVEL is removed from the compression
specification logic, and instead the compression level is always
assigned with each library's default if nothing is directly given. This
centralizes the checks on the compression methods supported by a given
build, and always assigns a default compression level when parsing a
compression specification. This results in complaining at an earlier
stage than previously if a build supports a compression method or not,
aka when parsing a specification in the backend or the frontend, and not
when processing it. zstd, lz4 and zlib are able to handle in their
respective routines setting up the compression level the case of a
default value, hence the backend or frontend code (pg_receivewal or
pg_basebackup) has now no need to know what the default compression
level should be if nothing is specified: the logic is now done so as the
specification parsing assigns it. It can also be enforced by passing
down a "level" set to the default value, that the backend will accept
(the replication protocol is for example able to handle a command like
BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')).
This code simplification fixes an issue with pg_basebackup --gzip
introduced by ffd5365, where the tarball of the streamed WAL segments
would be created as of pg_wal.tar.gz with uncompressed contents, while
the intention is to compress the segments with gzip at a default level.
The origin of the confusion comes from the handling of the default
compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of
0 was getting assigned, which is what walmethods.c would consider
as equivalent to no compression when streaming WAL segments with its tar
methods. Assigning always the compression level removes the confusion
of some code paths considering a value of 0 set in a specification as
either no compression or a default compression level.
Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests
where the shape of the compression detail string for client and
server-side compression was checked using gzip. This is a result of the
code simplification, as gzip specifications cannot be used if a build
does not support it.
Reported-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 15
2022-09-14 05:16:57 +02:00
|
|
|
wal_compress_level = client_compress->level;
|
2022-03-23 14:19:14 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2022-04-12 10:28:17 +02:00
|
|
|
wal_compress_algorithm = PG_COMPRESSION_NONE;
|
2022-03-23 14:19:14 +01:00
|
|
|
wal_compress_level = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
StartLogStreamer(xlogstart, starttli, sysidentifier,
|
Simplify handling of compression level with compression specifications
PG_COMPRESSION_OPTION_LEVEL is removed from the compression
specification logic, and instead the compression level is always
assigned with each library's default if nothing is directly given. This
centralizes the checks on the compression methods supported by a given
build, and always assigns a default compression level when parsing a
compression specification. This results in complaining at an earlier
stage than previously if a build supports a compression method or not,
aka when parsing a specification in the backend or the frontend, and not
when processing it. zstd, lz4 and zlib are able to handle in their
respective routines setting up the compression level the case of a
default value, hence the backend or frontend code (pg_receivewal or
pg_basebackup) has now no need to know what the default compression
level should be if nothing is specified: the logic is now done so as the
specification parsing assigns it. It can also be enforced by passing
down a "level" set to the default value, that the backend will accept
(the replication protocol is for example able to handle a command like
BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')).
This code simplification fixes an issue with pg_basebackup --gzip
introduced by ffd5365, where the tarball of the streamed WAL segments
would be created as of pg_wal.tar.gz with uncompressed contents, while
the intention is to compress the segments with gzip at a default level.
The origin of the confusion comes from the handling of the default
compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of
0 was getting assigned, which is what walmethods.c would consider
as equivalent to no compression when streaming WAL segments with its tar
methods. Assigning always the compression level removes the confusion
of some code paths considering a value of 0 set in a specification as
either no compression or a default compression level.
Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests
where the shape of the compression detail string for client and
server-side compression was checked using gzip. This is a result of the
code simplification, as gzip specifications cannot be used if a build
does not support it.
Reported-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 15
2022-09-14 05:16:57 +02:00
|
|
|
wal_compress_algorithm,
|
|
|
|
wal_compress_level);
|
2011-10-26 20:13:33 +02:00
|
|
|
}
|
|
|
|
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
if (serverMajor >= 1500)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
/* Receive a single tar stream with everything. */
|
2022-03-23 14:19:14 +01:00
|
|
|
ReceiveArchiveStream(conn, client_compress);
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Receive a tar file for each tablespace in turn */
|
|
|
|
for (i = 0; i < PQntuples(res); i++)
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
{
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
char archive_name[MAXPGPATH];
|
|
|
|
char *spclocation;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we write the data out to a tar file, it will be named
|
|
|
|
* base.tar if it's the main data directory or <tablespaceoid>.tar
|
|
|
|
* if it's for another tablespace. CreateBackupStreamer() will
|
2023-02-09 06:43:53 +01:00
|
|
|
* arrange to add an extension to the archive name if
|
|
|
|
* pg_basebackup is performing compression, depending on the
|
|
|
|
* compression type.
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
*/
|
|
|
|
if (PQgetisnull(res, i, 0))
|
|
|
|
{
|
|
|
|
strlcpy(archive_name, "base.tar", sizeof(archive_name));
|
|
|
|
spclocation = NULL;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
snprintf(archive_name, sizeof(archive_name),
|
|
|
|
"%s.tar", PQgetvalue(res, i, 0));
|
|
|
|
spclocation = PQgetvalue(res, i, 1);
|
|
|
|
}
|
|
|
|
|
2022-03-23 14:19:14 +01:00
|
|
|
ReceiveTarFile(conn, archive_name, spclocation, i,
|
|
|
|
client_compress);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
}
|
|
|
|
|
Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.
The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.
The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.
Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 19:47:26 +01:00
|
|
|
/*
|
|
|
|
* Now receive backup manifest, if appropriate.
|
|
|
|
*
|
|
|
|
* If we're writing a tarfile to stdout, ReceiveTarFile will have
|
|
|
|
* already processed the backup manifest and included it in the output
|
|
|
|
* tarfile. Such a configuration doesn't allow for writing multiple
|
|
|
|
* files.
|
|
|
|
*
|
|
|
|
* If we're talking to an older server, it won't send a backup
|
|
|
|
* manifest, so don't try to receive one.
|
|
|
|
*/
|
|
|
|
if (!writing_to_stdout && manifest)
|
|
|
|
ReceiveBackupManifest(conn);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
}
|
2011-01-23 12:21:23 +01:00
|
|
|
|
|
|
|
if (showprogress)
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
{
|
2022-02-17 21:03:40 +01:00
|
|
|
progress_update_filename(NULL);
|
Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
pg_basebackup knows how to do quite a few things with a backup that it
gets from the server, like just write out the files, or compress them
first, or even parse the tar format and inject a modified
postgresql.auto.conf file into the archive generated by the server.
Unforatunely, this makes pg_basebackup.c a very large source file, and
also somewhat difficult to enhance, because for example the knowledge
that the server is sending us a 'tar' file rather than some other sort
of archive is spread all over the place rather than centralized.
In an effort to improve this situation, this commit invents a new
'bbstreamer' abstraction. Each archive received from the server is
fed to a bbstreamer which may choose to dispose of it or pass it
along to some other bbstreamer. Chunks may also be "labelled"
according to whether they are part of the payload data of a file
in the archive or part of the archive metadata.
So, for example, if we want to take a tar file, modify the
postgresql.auto.conf file it contains, and the gzip the result
and write it out, we can use a bbstreamer_tar_parser to parse the
tar file received from the server, a bbstreamer_recovery_injector
to modify the contents of postgresql.auto.conf, a
bbstreamer_tar_archiver to replace the tar headers for the file
modified in the previous step with newly-built ones that are
correct for the modified file, and a bbstreamer_gzip_writer to
gzip and write the resulting data. Only the objects with "tar"
in the name know anything about the tar archive format, and in
theory we could re-archive using some other format rather than
"tar" if somebody wanted to write the code.
These chances do add a substantial amount of code, but I think the
result is a lot more maintainable and extensible. pg_basebackup.c
itself shrinks by roughly a third, with a lot of the complexity
previously contained there moving into the newly-added files.
Patch by me. The larger patch series of which this is a part has been
reviewed and tested at various times by Andres Freund, Sumanta
Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja,
Mark Dilger, Sergei Kornilov, and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 15:22:07 +01:00
|
|
|
progress_report(PQntuples(res), true, true);
|
|
|
|
}
|
2014-02-22 19:38:06 +01:00
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
PQclear(res);
|
|
|
|
|
2011-02-03 13:46:23 +01:00
|
|
|
/*
|
|
|
|
* Get the stop position
|
|
|
|
*/
|
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (PQresultStatus(res) != PGRES_TUPLES_OK)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("backup failed: %s",
|
|
|
|
PQerrorMessage(conn));
|
2011-02-03 13:46:23 +01:00
|
|
|
if (PQntuples(res) != 1)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("no write-ahead log end position returned from server");
|
2014-02-17 17:20:21 +01:00
|
|
|
strlcpy(xlogend, PQgetvalue(res, 0, 0), sizeof(xlogend));
|
2017-01-09 16:03:47 +01:00
|
|
|
if (verbose && includewal != NO_WAL)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("write-ahead log end point: %s", xlogend);
|
2011-02-03 13:46:23 +01:00
|
|
|
PQclear(res);
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
res = PQgetResult(conn);
|
|
|
|
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
|
|
|
{
|
2018-04-03 13:47:16 +02:00
|
|
|
const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
|
|
|
|
|
|
|
|
if (sqlstate &&
|
|
|
|
strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("checksum error occurred");
|
2018-04-03 13:47:16 +02:00
|
|
|
checksum_failure = true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("final receive failed: %s",
|
|
|
|
PQerrorMessage(conn));
|
2018-04-03 13:47:16 +02:00
|
|
|
}
|
2018-12-29 13:21:47 +01:00
|
|
|
exit(1);
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
|
2011-10-26 20:13:33 +02:00
|
|
|
if (bgchild > 0)
|
|
|
|
{
|
|
|
|
#ifndef WIN32
|
2011-12-11 00:15:15 +01:00
|
|
|
int status;
|
Modernize our code for looking up descriptive strings for Unix signals.
At least as far back as the 2008 spec, POSIX has defined strsignal(3)
for looking up descriptive strings for signal numbers. We hadn't gotten
the word though, and were still using the crufty old sys_siglist array,
which is in no standard even though most Unixen provide it.
Aside from not being formally standards-compliant, this was just plain
ugly because it involved #ifdef's at every place using the code.
To eliminate the #ifdef's, create a portability function pg_strsignal,
which wraps strsignal(3) if available and otherwise falls back to
sys_siglist[] if available. The set of Unixen with neither API is
probably empty these days, but on any platform with neither, you'll
just get "unrecognized signal". All extant callers print the numeric
signal number too, so no need to work harder than that.
Along the way, upgrade pg_basebackup's child-error-exit reporting
to match the rest of the system.
Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
2018-12-17 01:38:57 +01:00
|
|
|
pid_t r;
|
2011-12-11 00:15:15 +01:00
|
|
|
#else
|
|
|
|
DWORD status;
|
2016-06-10 00:02:36 +02:00
|
|
|
|
2016-04-29 13:59:47 +02:00
|
|
|
/*
|
|
|
|
* get a pointer sized version of bgchild to avoid warnings about
|
|
|
|
* casting to a different size on WIN64.
|
|
|
|
*/
|
|
|
|
intptr_t bgchild_handle = bgchild;
|
2012-06-24 21:14:31 +02:00
|
|
|
uint32 hi,
|
|
|
|
lo;
|
2011-10-26 20:13:33 +02:00
|
|
|
#endif
|
|
|
|
|
|
|
|
if (verbose)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("waiting for background process to finish streaming ...");
|
2011-10-26 20:13:33 +02:00
|
|
|
|
|
|
|
#ifndef WIN32
|
2012-03-29 05:24:07 +02:00
|
|
|
if (write(bgpipe[1], xlogend, strlen(xlogend)) != strlen(xlogend))
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not send command to background pipe: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
|
|
|
|
/* Just wait for the background process to exit */
|
|
|
|
r = waitpid(bgchild, &status, 0);
|
Modernize our code for looking up descriptive strings for Unix signals.
At least as far back as the 2008 spec, POSIX has defined strsignal(3)
for looking up descriptive strings for signal numbers. We hadn't gotten
the word though, and were still using the crufty old sys_siglist array,
which is in no standard even though most Unixen provide it.
Aside from not being formally standards-compliant, this was just plain
ugly because it involved #ifdef's at every place using the code.
To eliminate the #ifdef's, create a portability function pg_strsignal,
which wraps strsignal(3) if available and otherwise falls back to
sys_siglist[] if available. The set of Unixen with neither API is
probably empty these days, but on any platform with neither, you'll
just get "unrecognized signal". All extant callers print the numeric
signal number too, so no need to work harder than that.
Along the way, upgrade pg_basebackup's child-error-exit reporting
to match the rest of the system.
Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
2018-12-17 01:38:57 +01:00
|
|
|
if (r == (pid_t) -1)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not wait for child process: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
if (r != bgchild)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("child %d died, expected %d", (int) r, (int) bgchild);
|
Modernize our code for looking up descriptive strings for Unix signals.
At least as far back as the 2008 spec, POSIX has defined strsignal(3)
for looking up descriptive strings for signal numbers. We hadn't gotten
the word though, and were still using the crufty old sys_siglist array,
which is in no standard even though most Unixen provide it.
Aside from not being formally standards-compliant, this was just plain
ugly because it involved #ifdef's at every place using the code.
To eliminate the #ifdef's, create a portability function pg_strsignal,
which wraps strsignal(3) if available and otherwise falls back to
sys_siglist[] if available. The set of Unixen with neither API is
probably empty these days, but on any platform with neither, you'll
just get "unrecognized signal". All extant callers print the numeric
signal number too, so no need to work harder than that.
Along the way, upgrade pg_basebackup's child-error-exit reporting
to match the rest of the system.
Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
2018-12-17 01:38:57 +01:00
|
|
|
if (status != 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("%s", wait_result_to_str(status));
|
2011-10-26 20:13:33 +02:00
|
|
|
/* Exited normally, we're happy! */
|
|
|
|
#else /* WIN32 */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* On Windows, since we are in the same process, we can just store the
|
|
|
|
* value directly in the variable, and then set the flag that says
|
|
|
|
* it's there.
|
|
|
|
*/
|
2012-06-24 21:14:31 +02:00
|
|
|
if (sscanf(xlogend, "%X/%X", &hi, &lo) != 2)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not parse write-ahead log location \"%s\"",
|
|
|
|
xlogend);
|
2012-06-24 21:14:31 +02:00
|
|
|
xlogendptr = ((uint64) hi) << 32 | lo;
|
2011-10-26 20:13:33 +02:00
|
|
|
InterlockedIncrement(&has_xlogendptr);
|
|
|
|
|
|
|
|
/* First wait for the thread to exit */
|
2016-04-29 13:59:47 +02:00
|
|
|
if (WaitForSingleObjectEx((HANDLE) bgchild_handle, INFINITE, FALSE) !=
|
2012-07-31 16:11:11 +02:00
|
|
|
WAIT_OBJECT_0)
|
2011-10-26 20:13:33 +02:00
|
|
|
{
|
|
|
|
_dosmaperr(GetLastError());
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not wait for child thread: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
}
|
2016-04-29 13:59:47 +02:00
|
|
|
if (GetExitCodeThread((HANDLE) bgchild_handle, &status) == 0)
|
2011-10-26 20:13:33 +02:00
|
|
|
{
|
|
|
|
_dosmaperr(GetLastError());
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not get child thread exit status: %m");
|
2011-10-26 20:13:33 +02:00
|
|
|
}
|
|
|
|
if (status != 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("child thread exited with error %u",
|
|
|
|
(unsigned int) status);
|
2011-10-26 20:13:33 +02:00
|
|
|
/* Exited normally, we're happy */
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2018-11-25 16:31:16 +01:00
|
|
|
/* Free the configuration file contents */
|
2013-01-05 16:54:06 +01:00
|
|
|
destroyPQExpBuffer(recoveryconfcontents);
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
/*
|
|
|
|
* End of copy data. Final result is already checked inside the loop.
|
|
|
|
*/
|
2012-01-20 13:57:02 +01:00
|
|
|
PQclear(res);
|
2011-01-23 12:21:23 +01:00
|
|
|
PQfinish(conn);
|
2018-12-29 13:21:47 +01:00
|
|
|
conn = NULL;
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2016-09-29 18:00:00 +02:00
|
|
|
/*
|
|
|
|
* Make data persistent on disk once backup is completed. For tar format
|
2019-09-04 06:21:11 +02:00
|
|
|
* sync the parent directory and all its contents as each tar file was not
|
|
|
|
* synced after being completed. In plain format, all the data of the
|
|
|
|
* base directory is synced, taking into account all the tablespaces.
|
2016-09-29 18:00:00 +02:00
|
|
|
* Errors are not considered fatal.
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
*
|
|
|
|
* If, however, there's a backup target, we're not writing anything
|
|
|
|
* locally, so in that case we skip this step.
|
2016-09-29 18:00:00 +02:00
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (do_sync && backup_target == NULL)
|
2016-09-29 18:00:00 +02:00
|
|
|
{
|
2018-07-29 00:53:11 +02:00
|
|
|
if (verbose)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("syncing data to disk ...");
|
2016-09-29 18:00:00 +02:00
|
|
|
if (format == 't')
|
|
|
|
{
|
|
|
|
if (strcmp(basedir, "-") != 0)
|
2023-09-07 01:27:00 +02:00
|
|
|
(void) sync_dir_recurse(basedir, sync_method);
|
2016-09-29 18:00:00 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2023-09-07 01:27:00 +02:00
|
|
|
(void) sync_pgdata(basedir, serverVersion, sync_method);
|
2016-09-29 18:00:00 +02:00
|
|
|
}
|
2016-09-29 18:00:00 +02:00
|
|
|
}
|
|
|
|
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
/*
|
|
|
|
* After synchronizing data to disk, perform a durable rename of
|
|
|
|
* backup_manifest.tmp to backup_manifest, if we wrote such a file. This
|
|
|
|
* way, a failure or system crash before we reach this point will leave us
|
|
|
|
* without a backup_manifest file, decreasing the chances that a directory
|
|
|
|
* we leave behind will be mistaken for a valid backup.
|
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (!writing_to_stdout && manifest && backup_target == NULL)
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
{
|
|
|
|
char tmp_filename[MAXPGPATH];
|
|
|
|
char filename[MAXPGPATH];
|
|
|
|
|
|
|
|
if (verbose)
|
|
|
|
pg_log_info("renaming backup_manifest.tmp to backup_manifest");
|
|
|
|
|
|
|
|
snprintf(tmp_filename, MAXPGPATH, "%s/backup_manifest.tmp", basedir);
|
|
|
|
snprintf(filename, MAXPGPATH, "%s/backup_manifest", basedir);
|
|
|
|
|
2022-01-23 22:59:23 +01:00
|
|
|
if (do_sync)
|
|
|
|
{
|
|
|
|
/* durable_rename emits its own log message in case of failure */
|
|
|
|
if (durable_rename(tmp_filename, filename) != 0)
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
if (rename(tmp_filename, filename) != 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not rename file \"%s\" to \"%s\": %m",
|
|
|
|
tmp_filename, filename);
|
2022-01-23 22:59:23 +01:00
|
|
|
}
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
}
|
|
|
|
|
2011-01-23 12:21:23 +01:00
|
|
|
if (verbose)
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_info("base backup completed");
|
2011-01-23 12:21:23 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
int
|
|
|
|
main(int argc, char **argv)
|
|
|
|
{
|
|
|
|
static struct option long_options[] = {
|
|
|
|
{"help", no_argument, NULL, '?'},
|
|
|
|
{"version", no_argument, NULL, 'V'},
|
|
|
|
{"pgdata", required_argument, NULL, 'D'},
|
|
|
|
{"format", required_argument, NULL, 'F'},
|
2023-12-20 15:49:12 +01:00
|
|
|
{"incremental", required_argument, NULL, 'i'},
|
2011-01-23 12:21:23 +01:00
|
|
|
{"checkpoint", required_argument, NULL, 'c'},
|
2017-09-26 22:07:52 +02:00
|
|
|
{"create-slot", no_argument, NULL, 'C'},
|
2014-02-27 22:55:57 +01:00
|
|
|
{"max-rate", required_argument, NULL, 'r'},
|
2013-01-05 16:54:06 +01:00
|
|
|
{"write-recovery-conf", no_argument, NULL, 'R'},
|
2015-07-22 03:06:45 +02:00
|
|
|
{"slot", required_argument, NULL, 'S'},
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
{"target", required_argument, NULL, 't'},
|
2014-02-22 19:38:06 +01:00
|
|
|
{"tablespace-mapping", required_argument, NULL, 'T'},
|
2017-02-09 22:42:51 +01:00
|
|
|
{"wal-method", required_argument, NULL, 'X'},
|
2011-05-30 00:02:02 +02:00
|
|
|
{"gzip", no_argument, NULL, 'z'},
|
2011-01-23 12:21:23 +01:00
|
|
|
{"compress", required_argument, NULL, 'Z'},
|
|
|
|
{"label", required_argument, NULL, 'l'},
|
2016-10-19 18:00:00 +02:00
|
|
|
{"no-clean", no_argument, NULL, 'n'},
|
|
|
|
{"no-sync", no_argument, NULL, 'N'},
|
2013-02-25 13:48:27 +01:00
|
|
|
{"dbname", required_argument, NULL, 'd'},
|
2011-01-23 12:21:23 +01:00
|
|
|
{"host", required_argument, NULL, 'h'},
|
|
|
|
{"port", required_argument, NULL, 'p'},
|
|
|
|
{"username", required_argument, NULL, 'U'},
|
|
|
|
{"no-password", no_argument, NULL, 'w'},
|
|
|
|
{"password", no_argument, NULL, 'W'},
|
2012-07-31 16:11:11 +02:00
|
|
|
{"status-interval", required_argument, NULL, 's'},
|
2011-01-23 12:21:23 +01:00
|
|
|
{"verbose", no_argument, NULL, 'v'},
|
|
|
|
{"progress", no_argument, NULL, 'P'},
|
2017-02-09 22:42:51 +01:00
|
|
|
{"waldir", required_argument, NULL, 1},
|
2017-01-16 13:56:43 +01:00
|
|
|
{"no-slot", no_argument, NULL, 2},
|
2018-05-21 16:01:49 +02:00
|
|
|
{"no-verify-checksums", no_argument, NULL, 3},
|
2020-03-19 09:09:00 +01:00
|
|
|
{"no-estimate-size", no_argument, NULL, 4},
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
{"no-manifest", no_argument, NULL, 5},
|
|
|
|
{"manifest-force-encode", no_argument, NULL, 6},
|
|
|
|
{"manifest-checksums", required_argument, NULL, 7},
|
Allow using syncfs() in frontend utilities.
This commit allows specifying a --sync-method in several frontend
utilities that must synchronize many files to disk (initdb,
pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade).
On Linux, users can specify "syncfs" to synchronize the relevant
file systems instead of calling fsync() for every single file. In
many cases, using syncfs() is much faster.
As with recovery_init_sync_method, this new option comes with some
caveats. The descriptions of these caveats have been moved to a
new appendix section in the documentation.
Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier, Thomas Munro, Robert Haas, Justin Pryzby
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
2023-09-07 01:27:16 +02:00
|
|
|
{"sync-method", required_argument, NULL, 8},
|
2011-01-23 12:21:23 +01:00
|
|
|
{NULL, 0, NULL, 0}
|
|
|
|
};
|
|
|
|
int c;
|
|
|
|
|
|
|
|
int option_index;
|
2022-03-23 14:19:14 +01:00
|
|
|
char *compression_algorithm = "none";
|
|
|
|
char *compression_detail = NULL;
|
2023-12-20 15:49:12 +01:00
|
|
|
char *incremental_manifest = NULL;
|
2022-03-23 14:19:14 +01:00
|
|
|
CompressionLocation compressloc = COMPRESS_LOCATION_UNSPECIFIED;
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
pg_compress_specification client_compress;
|
2011-01-23 12:21:23 +01:00
|
|
|
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_logging_init(argv[0]);
|
2011-01-23 12:21:23 +01:00
|
|
|
progname = get_progname(argv[0]);
|
|
|
|
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_basebackup"));
|
|
|
|
|
|
|
|
if (argc > 1)
|
|
|
|
{
|
|
|
|
if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
|
|
|
|
{
|
|
|
|
usage();
|
|
|
|
exit(0);
|
|
|
|
}
|
|
|
|
else if (strcmp(argv[1], "-V") == 0
|
|
|
|
|| strcmp(argv[1], "--version") == 0)
|
|
|
|
{
|
|
|
|
puts("pg_basebackup (PostgreSQL) " PG_VERSION);
|
|
|
|
exit(0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-09-12 18:00:00 +02:00
|
|
|
atexit(cleanup_directories_atexit);
|
|
|
|
|
2023-12-20 15:49:12 +01:00
|
|
|
while ((c = getopt_long(argc, argv, "c:Cd:D:F:h:i:l:nNp:Pr:Rs:S:t:T:U:vwWX:zZ:",
|
2011-01-23 12:21:23 +01:00
|
|
|
long_options, &option_index)) != -1)
|
|
|
|
{
|
|
|
|
switch (c)
|
|
|
|
{
|
2022-12-12 14:33:41 +01:00
|
|
|
case 'c':
|
|
|
|
if (pg_strcasecmp(optarg, "fast") == 0)
|
|
|
|
fastcheckpoint = true;
|
|
|
|
else if (pg_strcasecmp(optarg, "spread") == 0)
|
|
|
|
fastcheckpoint = false;
|
|
|
|
else
|
|
|
|
pg_fatal("invalid checkpoint argument \"%s\", must be \"fast\" or \"spread\"",
|
|
|
|
optarg);
|
|
|
|
break;
|
2017-09-26 22:07:52 +02:00
|
|
|
case 'C':
|
|
|
|
create_slot = true;
|
|
|
|
break;
|
2022-12-12 14:33:41 +01:00
|
|
|
case 'd':
|
|
|
|
connection_string = pg_strdup(optarg);
|
|
|
|
break;
|
2011-01-23 12:21:23 +01:00
|
|
|
case 'D':
|
2012-10-02 21:35:10 +02:00
|
|
|
basedir = pg_strdup(optarg);
|
2011-01-23 12:21:23 +01:00
|
|
|
break;
|
|
|
|
case 'F':
|
|
|
|
if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
|
|
|
|
format = 'p';
|
|
|
|
else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
|
|
|
|
format = 't';
|
|
|
|
else
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("invalid output format \"%s\", must be \"plain\" or \"tar\"",
|
|
|
|
optarg);
|
2011-01-23 12:21:23 +01:00
|
|
|
break;
|
2022-12-12 14:33:41 +01:00
|
|
|
case 'h':
|
|
|
|
dbhost = pg_strdup(optarg);
|
|
|
|
break;
|
2023-12-20 15:49:12 +01:00
|
|
|
case 'i':
|
|
|
|
incremental_manifest = pg_strdup(optarg);
|
|
|
|
break;
|
2022-12-12 14:33:41 +01:00
|
|
|
case 'l':
|
|
|
|
label = pg_strdup(optarg);
|
|
|
|
break;
|
|
|
|
case 'n':
|
|
|
|
noclean = true;
|
|
|
|
break;
|
|
|
|
case 'N':
|
|
|
|
do_sync = false;
|
|
|
|
break;
|
|
|
|
case 'p':
|
|
|
|
dbport = pg_strdup(optarg);
|
|
|
|
break;
|
|
|
|
case 'P':
|
|
|
|
showprogress = true;
|
|
|
|
break;
|
2014-02-27 22:55:57 +01:00
|
|
|
case 'r':
|
|
|
|
maxrate = parse_max_rate(optarg);
|
|
|
|
break;
|
2013-01-05 16:54:06 +01:00
|
|
|
case 'R':
|
|
|
|
writerecoveryconf = true;
|
|
|
|
break;
|
2022-12-12 14:33:41 +01:00
|
|
|
case 's':
|
|
|
|
if (!option_parse_int(optarg, "-s/--status-interval", 0,
|
|
|
|
INT_MAX / 1000,
|
|
|
|
&standby_message_timeout))
|
|
|
|
exit(1);
|
|
|
|
standby_message_timeout *= 1000;
|
|
|
|
break;
|
2015-07-22 03:06:45 +02:00
|
|
|
case 'S':
|
2017-01-16 13:56:43 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* When specifying replication slot name, use a permanent
|
|
|
|
* slot.
|
|
|
|
*/
|
2015-07-22 03:06:45 +02:00
|
|
|
replication_slot = pg_strdup(optarg);
|
2017-01-16 13:56:43 +01:00
|
|
|
temp_replication_slot = false;
|
|
|
|
break;
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
case 't':
|
|
|
|
backup_target = pg_strdup(optarg);
|
|
|
|
break;
|
2014-02-22 19:38:06 +01:00
|
|
|
case 'T':
|
|
|
|
tablespace_list_append(optarg);
|
|
|
|
break;
|
2022-12-12 14:33:41 +01:00
|
|
|
case 'U':
|
|
|
|
dbuser = pg_strdup(optarg);
|
|
|
|
break;
|
|
|
|
case 'v':
|
|
|
|
verbose++;
|
|
|
|
break;
|
|
|
|
case 'w':
|
|
|
|
dbgetpassword = -1;
|
|
|
|
break;
|
|
|
|
case 'W':
|
|
|
|
dbgetpassword = 1;
|
|
|
|
break;
|
2012-06-10 13:43:51 +02:00
|
|
|
case 'X':
|
2017-01-04 10:40:38 +01:00
|
|
|
if (strcmp(optarg, "n") == 0 ||
|
|
|
|
strcmp(optarg, "none") == 0)
|
2012-06-10 13:43:51 +02:00
|
|
|
{
|
2017-01-09 16:03:47 +01:00
|
|
|
includewal = NO_WAL;
|
2012-06-10 13:43:51 +02:00
|
|
|
}
|
2017-01-04 10:40:38 +01:00
|
|
|
else if (strcmp(optarg, "f") == 0 ||
|
2011-10-26 20:13:33 +02:00
|
|
|
strcmp(optarg, "fetch") == 0)
|
2017-01-04 10:40:38 +01:00
|
|
|
{
|
2017-01-09 16:03:47 +01:00
|
|
|
includewal = FETCH_WAL;
|
2017-01-04 10:40:38 +01:00
|
|
|
}
|
2011-10-26 20:13:33 +02:00
|
|
|
else if (strcmp(optarg, "s") == 0 ||
|
|
|
|
strcmp(optarg, "stream") == 0)
|
2017-01-04 10:40:38 +01:00
|
|
|
{
|
2017-01-09 16:03:47 +01:00
|
|
|
includewal = STREAM_WAL;
|
2017-01-04 10:40:38 +01:00
|
|
|
}
|
2011-10-26 20:13:33 +02:00
|
|
|
else
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("invalid wal-method option \"%s\", must be \"fetch\", \"stream\", or \"none\"",
|
|
|
|
optarg);
|
2011-01-30 21:30:09 +01:00
|
|
|
break;
|
2011-05-30 00:02:02 +02:00
|
|
|
case 'z':
|
2022-03-23 14:19:14 +01:00
|
|
|
compression_algorithm = "gzip";
|
|
|
|
compression_detail = NULL;
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
compressloc = COMPRESS_LOCATION_UNSPECIFIED;
|
2011-05-30 00:02:02 +02:00
|
|
|
break;
|
2011-01-23 12:21:23 +01:00
|
|
|
case 'Z':
|
Refactor code parsing compression option values (-Z/--compress)
This commit moves the code in charge of deparsing the method and detail
strings fed later to parse_compress_specification() to a common routine,
where the backward-compatible case of only an integer being found (N
= 0 => "none", N > 1 => gzip at level N) is handled.
Note that this has a side-effect for pg_basebackup, as we now attempt to
detect "server-" and "client-" before checking for the integer-only
pre-14 grammar, where values like server-N and client-N (without the
follow-up detail string) are now valid rather than failing because of an
unsupported method name. Past grammars are still handled the same way,
but these flavors are now authorized, and would now switch to consider N
= 0 as no compression and N > 1 as gzip with the compression level used
as N, with the caller still controlling if the compression method should
be done server-side, client-side or is unspecified. The documentation
of pg_basebackup is updated to reflect that.
This benefits other code paths that would like to rely on the same logic
as pg_basebackup and pg_receivewal with option values used for
compression specifications, one area discussed lately being pg_dump.
Author: Georgios Kokolatos, Michael Paquier
Discussion: https://postgr.es/m/O4mutIrCES8ZhlXJiMvzsivT7ztAMja2lkdL1LJx6O5f22I2W8PBIeLKz7mDLwxHoibcnRAYJXm1pH4tyUNC4a8eDzLn22a6Pb1S74Niexg=@pm.me
2022-11-30 01:34:32 +01:00
|
|
|
backup_parse_compress_options(optarg, &compression_algorithm,
|
|
|
|
&compression_detail, &compressloc);
|
2011-01-23 12:21:23 +01:00
|
|
|
break;
|
2022-12-12 14:33:41 +01:00
|
|
|
case 1:
|
|
|
|
xlog_dir = pg_strdup(optarg);
|
2011-01-23 12:21:23 +01:00
|
|
|
break;
|
2022-12-12 14:33:41 +01:00
|
|
|
case 2:
|
|
|
|
no_slot = true;
|
2011-01-23 12:21:23 +01:00
|
|
|
break;
|
2018-05-21 16:01:49 +02:00
|
|
|
case 3:
|
2018-04-03 13:47:16 +02:00
|
|
|
verify_checksums = false;
|
|
|
|
break;
|
2020-03-19 09:09:00 +01:00
|
|
|
case 4:
|
|
|
|
estimatesize = false;
|
|
|
|
break;
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
case 5:
|
|
|
|
manifest = false;
|
|
|
|
break;
|
|
|
|
case 6:
|
|
|
|
manifest_force_encode = true;
|
|
|
|
break;
|
|
|
|
case 7:
|
|
|
|
manifest_checksums = pg_strdup(optarg);
|
|
|
|
break;
|
Allow using syncfs() in frontend utilities.
This commit allows specifying a --sync-method in several frontend
utilities that must synchronize many files to disk (initdb,
pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade).
On Linux, users can specify "syncfs" to synchronize the relevant
file systems instead of calling fsync() for every single file. In
many cases, using syncfs() is much faster.
As with recovery_init_sync_method, this new option comes with some
caveats. The descriptions of these caveats have been moved to a
new appendix section in the documentation.
Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier, Thomas Munro, Robert Haas, Justin Pryzby
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
2023-09-07 01:27:16 +02:00
|
|
|
case 8:
|
|
|
|
if (!parse_sync_method(optarg, &sync_method))
|
|
|
|
exit(1);
|
|
|
|
break;
|
2011-01-23 12:21:23 +01:00
|
|
|
default:
|
2022-04-08 20:55:14 +02:00
|
|
|
/* getopt_long already emitted a complaint */
|
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2011-01-23 12:21:23 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Any non-option arguments?
|
|
|
|
*/
|
|
|
|
if (optind < argc)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("too many command-line arguments (first is \"%s\")",
|
|
|
|
argv[optind]);
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2011-01-23 12:21:23 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
* Setting the backup target to 'client' is equivalent to leaving out the
|
|
|
|
* option. This logic allows us to assume elsewhere that the backup is
|
|
|
|
* being stored locally if and only if backup_target == NULL.
|
|
|
|
*/
|
|
|
|
if (backup_target != NULL && strcmp(backup_target, "client") == 0)
|
|
|
|
{
|
|
|
|
pg_free(backup_target);
|
|
|
|
backup_target = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Can't use --format with --target. Without --target, default format is
|
|
|
|
* tar.
|
|
|
|
*/
|
|
|
|
if (backup_target != NULL && format != '\0')
|
|
|
|
{
|
|
|
|
pg_log_error("cannot specify both format and backup target");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
if (format == '\0')
|
|
|
|
format = 'p';
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Either directory or backup target should be specified, but not both
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (basedir == NULL && backup_target == NULL)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
pg_log_error("must specify output directory or backup target");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
if (basedir != NULL && backup_target != NULL)
|
|
|
|
{
|
|
|
|
pg_log_error("cannot specify both output directory and backup target");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2011-01-23 12:21:23 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2022-03-23 14:19:14 +01:00
|
|
|
* If the user has not specified where to perform backup compression,
|
|
|
|
* default to the client, unless the user specified --target, in which
|
|
|
|
* case the server is the only choice.
|
2011-01-23 12:21:23 +01:00
|
|
|
*/
|
2022-03-23 14:19:14 +01:00
|
|
|
if (compressloc == COMPRESS_LOCATION_UNSPECIFIED)
|
2011-01-23 12:21:23 +01:00
|
|
|
{
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (backup_target == NULL)
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
compressloc = COMPRESS_LOCATION_CLIENT;
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
else
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
compressloc = COMPRESS_LOCATION_SERVER;
|
|
|
|
}
|
|
|
|
|
2022-03-23 14:19:14 +01:00
|
|
|
/*
|
|
|
|
* If any compression that we're doing is happening on the client side, we
|
|
|
|
* must try to parse the compression algorithm and detail, but if it's all
|
|
|
|
* on the server side, then we're just going to pass through whatever was
|
|
|
|
* requested and let the server decide what to do.
|
|
|
|
*/
|
|
|
|
if (compressloc == COMPRESS_LOCATION_CLIENT)
|
|
|
|
{
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
pg_compress_algorithm alg;
|
2022-03-23 14:19:14 +01:00
|
|
|
char *error_detail;
|
|
|
|
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
if (!parse_compress_algorithm(compression_algorithm, &alg))
|
2022-09-25 00:38:35 +02:00
|
|
|
pg_fatal("unrecognized compression algorithm: \"%s\"",
|
2022-04-08 20:55:14 +02:00
|
|
|
compression_algorithm);
|
2022-03-23 14:19:14 +01:00
|
|
|
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
parse_compress_specification(alg, compression_detail, &client_compress);
|
|
|
|
error_detail = validate_compress_specification(&client_compress);
|
2022-03-23 14:19:14 +01:00
|
|
|
if (error_detail != NULL)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("invalid compression specification: %s",
|
|
|
|
error_detail);
|
2022-03-23 14:19:14 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
Assert(compressloc == COMPRESS_LOCATION_SERVER);
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
client_compress.algorithm = PG_COMPRESSION_NONE;
|
2022-03-23 14:19:14 +01:00
|
|
|
client_compress.options = 0;
|
|
|
|
}
|
|
|
|
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
/*
|
|
|
|
* Can't perform client-side compression if the backup is not being sent
|
|
|
|
* to the client.
|
|
|
|
*/
|
|
|
|
if (backup_target != NULL && compressloc == COMPRESS_LOCATION_CLIENT)
|
|
|
|
{
|
|
|
|
pg_log_error("client-side compression is not possible when a backup target is specified");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2022-03-23 14:19:14 +01:00
|
|
|
* Client-side compression doesn't make sense unless tar format is in use.
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
*/
|
2022-03-23 14:19:14 +01:00
|
|
|
if (format == 'p' && compressloc == COMPRESS_LOCATION_CLIENT &&
|
Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 06:38:54 +02:00
|
|
|
client_compress.algorithm != PG_COMPRESSION_NONE)
|
Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.
To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.
At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.
Along the way, I removed the information message added by commit
5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.
Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454.
This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for
pg_verifybackup itself.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 21:13:18 +01:00
|
|
|
{
|
|
|
|
pg_log_error("only tar mode backups can be compressed");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2011-01-23 12:21:23 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
/*
|
|
|
|
* Sanity checks for WAL method.
|
|
|
|
*/
|
|
|
|
if (backup_target != NULL && includewal == STREAM_WAL)
|
|
|
|
{
|
|
|
|
pg_log_error("WAL cannot be streamed when a backup target is specified");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
2017-01-09 16:03:47 +01:00
|
|
|
if (format == 't' && includewal == STREAM_WAL && strcmp(basedir, "-") == 0)
|
2016-12-21 12:27:37 +01:00
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("cannot stream write-ahead logs in tar mode to stdout");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2016-12-21 12:27:37 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
2017-01-16 18:20:57 +01:00
|
|
|
if (replication_slot && includewal != STREAM_WAL)
|
2015-07-22 03:06:45 +02:00
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("replication slots can only be used with WAL streaming");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2015-07-22 03:06:45 +02:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
/*
|
|
|
|
* Sanity checks for replication slot options.
|
|
|
|
*/
|
2017-01-16 13:56:43 +01:00
|
|
|
if (no_slot)
|
|
|
|
{
|
|
|
|
if (replication_slot)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("--no-slot cannot be used with slot name");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2017-01-16 13:56:43 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
temp_replication_slot = false;
|
|
|
|
}
|
|
|
|
|
2017-09-26 22:07:52 +02:00
|
|
|
if (create_slot)
|
|
|
|
{
|
|
|
|
if (!replication_slot)
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("%s needs a slot to be specified using --slot",
|
|
|
|
"--create-slot");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2017-09-26 22:07:52 +02:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (no_slot)
|
|
|
|
{
|
2020-06-15 08:22:52 +02:00
|
|
|
pg_log_error("%s and %s are incompatible options",
|
|
|
|
"--create-slot", "--no-slot");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2017-09-26 22:07:52 +02:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
/*
|
|
|
|
* Sanity checks on WAL directory.
|
|
|
|
*/
|
2017-08-31 04:28:36 +02:00
|
|
|
if (xlog_dir)
|
2013-11-27 06:00:16 +01:00
|
|
|
{
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (backup_target != NULL)
|
|
|
|
{
|
|
|
|
pg_log_error("WAL directory location cannot be specified along with a backup target");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
2013-11-27 06:00:16 +01:00
|
|
|
if (format != 'p')
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("WAL directory location can only be specified in plain mode");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2013-11-27 06:00:16 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* clean up xlog directory name, check it's absolute */
|
|
|
|
canonicalize_path(xlog_dir);
|
|
|
|
if (!is_absolute_path(xlog_dir))
|
|
|
|
{
|
Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 14:24:37 +02:00
|
|
|
pg_log_error("WAL directory location must be an absolute path");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2013-11-27 06:00:16 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
/*
|
|
|
|
* Sanity checks for progress reporting options.
|
|
|
|
*/
|
2020-03-19 09:09:00 +01:00
|
|
|
if (showprogress && !estimatesize)
|
|
|
|
{
|
2020-06-15 08:22:52 +02:00
|
|
|
pg_log_error("%s and %s are incompatible options",
|
|
|
|
"--progress", "--no-estimate-size");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
2020-03-19 09:09:00 +01:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
/*
|
|
|
|
* Sanity checks for backup manifest options.
|
|
|
|
*/
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
if (!manifest && manifest_checksums != NULL)
|
|
|
|
{
|
2020-06-15 08:22:52 +02:00
|
|
|
pg_log_error("%s and %s are incompatible options",
|
|
|
|
"--no-manifest", "--manifest-checksums");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!manifest && manifest_force_encode)
|
|
|
|
{
|
2020-06-15 08:22:52 +02:00
|
|
|
pg_log_error("%s and %s are incompatible options",
|
|
|
|
"--no-manifest", "--manifest-force-encode");
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
|
Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 20:59:47 +02:00
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
2016-10-20 17:24:37 +02:00
|
|
|
/* connection in replication mode to server */
|
|
|
|
conn = GetConnection();
|
|
|
|
if (!conn)
|
|
|
|
{
|
|
|
|
/* Error message already written in GetConnection() */
|
|
|
|
exit(1);
|
|
|
|
}
|
2018-12-29 13:21:47 +01:00
|
|
|
atexit(disconnect_atexit);
|
2016-10-20 17:24:37 +02:00
|
|
|
|
2022-02-23 14:24:43 +01:00
|
|
|
#ifndef WIN32
|
2022-05-12 21:17:30 +02:00
|
|
|
|
2022-02-23 14:24:43 +01:00
|
|
|
/*
|
|
|
|
* Trap SIGCHLD to be able to handle the WAL stream process exiting. There
|
|
|
|
* is no SIGCHLD on Windows, there we rely on the background thread
|
|
|
|
* setting the signal variable on unexpected but graceful exit. If the WAL
|
|
|
|
* stream thread crashes on Windows it will bring down the entire process
|
|
|
|
* as it's a thread, so there is nothing to catch should that happen. A
|
|
|
|
* crash on UNIX will be caught by the signal handler.
|
|
|
|
*/
|
|
|
|
pqsignal(SIGCHLD, sigchld_handler);
|
|
|
|
#endif
|
|
|
|
|
2018-04-07 23:45:39 +02:00
|
|
|
/*
|
|
|
|
* Set umask so that directories/files are created with the same
|
|
|
|
* permissions as directories/files in the source data directory.
|
|
|
|
*
|
|
|
|
* pg_mode_mask is set to owner-only by default and then updated in
|
|
|
|
* GetConnection() where we get the mode from the server-side with
|
|
|
|
* RetrieveDataDirCreatePerm() and then call SetDataDirectoryCreatePerm().
|
|
|
|
*/
|
|
|
|
umask(pg_mode_mask);
|
|
|
|
|
Disable silently generation of manifests with servers <= 12 in pg_basebackup
Since 0d8c9c1, pg_basebackup would generate an error if connected to a
backend version older than 12 where backup manifests are not supported.
Avoiding this error is possible by using the --no-manifest option.
This error handling could be confusing for some users, where patching a
backup script that interacts with multiple backend versions would cause
the addition of --no-manifest to potentially not generate a backup
manifest even for Postgres 13 and newer versions. As we want to
encourage the use of backup manifests as much as possible, this commit
silently disables manifests where not supported, instead of generating
an error.
While on it, rework a bit the code to make it more consistent with the
surroundings when generating the BASE_BACKUP command.
Per discussion with Andres Freund, Stephen Frost, Robert Haas, Álvaro
Herrera, Kyotaro Horiguchi, Tom Lane, David Steele, and me.
Author: Michael Paquier
Discussion: https://postgr.es/m/20200410080910.GZ1606@paquier.xyz
2020-04-16 06:57:07 +02:00
|
|
|
/* Backup manifests are supported in 13 and newer versions */
|
|
|
|
if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_MANIFESTS)
|
|
|
|
manifest = false;
|
|
|
|
|
2018-04-07 23:45:39 +02:00
|
|
|
/*
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
* If an output directory was specified, verify that it exists, or create
|
|
|
|
* it. Note that for a tar backup, an output directory of "-" means we are
|
|
|
|
* writing to stdout, so do nothing in that case.
|
2018-04-07 23:45:39 +02:00
|
|
|
*/
|
Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command. If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.
On the server side, we now support two additional types of backup
targets. There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.
Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.
Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.
Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2021-11-16 21:20:50 +01:00
|
|
|
if (basedir != NULL && (format == 'p' || strcmp(basedir, "-") != 0))
|
2018-04-07 23:45:39 +02:00
|
|
|
verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata);
|
|
|
|
|
Make WAL segment size configurable at initdb time.
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-20 07:03:48 +02:00
|
|
|
/* determine remote server's xlog segment size */
|
|
|
|
if (!RetrieveWalSegSize(conn))
|
2018-12-29 13:21:47 +01:00
|
|
|
exit(1);
|
Make WAL segment size configurable at initdb time.
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
2017-09-20 07:03:48 +02:00
|
|
|
|
2017-05-12 17:49:56 +02:00
|
|
|
/* Create pg_wal symlink, if required */
|
2017-08-31 04:28:36 +02:00
|
|
|
if (xlog_dir)
|
2013-11-27 06:00:16 +01:00
|
|
|
{
|
|
|
|
char *linkloc;
|
|
|
|
|
2016-09-12 18:00:00 +02:00
|
|
|
verify_dir_is_empty_or_create(xlog_dir, &made_new_xlogdir, &found_existing_xlogdir);
|
2013-11-27 06:00:16 +01:00
|
|
|
|
2016-10-20 17:24:37 +02:00
|
|
|
/*
|
|
|
|
* Form name of the place where the symlink must go. pg_xlog has been
|
|
|
|
* renamed to pg_wal in post-10 clusters.
|
|
|
|
*/
|
|
|
|
linkloc = psprintf("%s/%s", basedir,
|
|
|
|
PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
|
|
|
|
"pg_xlog" : "pg_wal");
|
2013-11-27 06:00:16 +01:00
|
|
|
|
|
|
|
if (symlink(xlog_dir, linkloc) != 0)
|
2022-04-08 20:55:14 +02:00
|
|
|
pg_fatal("could not create symbolic link \"%s\": %m", linkloc);
|
2013-11-27 06:00:16 +01:00
|
|
|
free(linkloc);
|
|
|
|
}
|
|
|
|
|
2022-03-23 14:19:14 +01:00
|
|
|
BaseBackup(compression_algorithm, compression_detail, compressloc,
|
2023-12-20 15:49:12 +01:00
|
|
|
&client_compress, incremental_manifest);
|
2011-01-23 12:21:23 +01:00
|
|
|
|
2016-09-12 18:00:00 +02:00
|
|
|
success = true;
|
2011-01-23 12:21:23 +01:00
|
|
|
return 0;
|
|
|
|
}
|