2010-05-12 04:19:11 +02:00
|
|
|
/*
|
|
|
|
* pg_upgrade.h
|
2010-07-03 16:23:14 +02:00
|
|
|
*
|
2024-01-04 02:49:05 +01:00
|
|
|
* Copyright (c) 2010-2024, PostgreSQL Global Development Group
|
2015-03-11 03:33:25 +01:00
|
|
|
* src/bin/pg_upgrade/pg_upgrade.h
|
2010-05-12 04:19:11 +02:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <assert.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <sys/time.h>
|
|
|
|
|
2022-09-27 18:01:57 +02:00
|
|
|
#include "common/relpath.h"
|
2010-05-12 04:19:11 +02:00
|
|
|
#include "libpq-fe.h"
|
|
|
|
|
2022-04-08 20:55:14 +02:00
|
|
|
/* For now, pg_upgrade does not use common/logging.c; use our own pg_fatal */
|
|
|
|
#undef pg_fatal
|
|
|
|
|
2011-07-02 00:17:12 +02:00
|
|
|
/* Use port in the private/dynamic port number range */
|
|
|
|
#define DEF_PGUPORT 50432
|
|
|
|
|
2010-05-12 04:19:11 +02:00
|
|
|
#define MAX_STRING 1024
|
|
|
|
#define QUERY_ALLOC 8192
|
2010-07-06 21:19:02 +02:00
|
|
|
|
2023-08-24 19:13:31 +02:00
|
|
|
#define MESSAGE_WIDTH 62
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
#define GET_MAJOR_VERSION(v) ((v) / 100)
|
|
|
|
|
|
|
|
/* contains both global db information and CREATE DATABASE commands */
|
|
|
|
#define GLOBALS_DUMP_FILE "pg_upgrade_dump_globals.sql"
|
2012-11-30 22:30:13 +01:00
|
|
|
#define DB_DUMP_FILE_MASK "pg_upgrade_dump_%u.custom"
|
2010-05-12 04:19:11 +02:00
|
|
|
|
pg_upgrade: Move all the files generated internally to a subdirectory
Historically, the location of any files generated by pg_upgrade, as of
the per-database logs and internal dumps, has been the current working
directory, leaving all those files behind when using --retain or on a
failure.
Putting all those contents in a targeted subdirectory makes the whole
easier to debug, and simplifies the code in charge of cleaning up the
logs. Note that another reason is that this facilitates the move of
pg_upgrade to TAP with a fixed location for all the logs to grab if the
test fails repeatedly.
Initially, we thought about being able to specify the output directory
with a new option, but we have settled on using a subdirectory located
at the root of the new cluster's data folder, "pg_upgrade_output.d",
instead, as at the end the new data directory is the location of all the
data generated by pg_upgrade. There is a take with group permissions
here though: if the new data folder has been initialized with this
option, we need to create all the files and paths with the correct
permissions or a base backup taken after a pg_upgrade --retain would
fail, meaning that GetDataDirectoryCreatePerm() has to be called before
creating the log paths, before a couple of sanity checks on the clusters
and before getting the socket directory for the cluster's host settings.
The idea of the new location is based on a suggestion from Peter
Eisentraut.
Also thanks to Andrew Dunstan, Peter Eisentraut, Daniel Gustafsson, Tom
Lane and Bruce Momjian for the discussion (in alphabetical order).
Author: Justin Pryzby
Discussion: https://postgr.es/m/20211212025017.GN17618@telsasoft.com
2022-02-06 04:27:29 +01:00
|
|
|
/*
|
Restructure pg_upgrade output directories for better idempotence
38bfae3 has moved the contents written to files by pg_upgrade under a
new directory called pg_upgrade_output.d/ located in the new cluster's
data folder, and it used a simple structure made of two subdirectories
leading to a fixed structure: log/ and dump/. This design has made
weaker pg_upgrade on repeated calls, as we could get failures when
creating one or more of those directories, while potentially losing the
logs of a previous run (logs are retained automatically on failure, and
cleaned up on success unless --retain is specified). So a user would
need to clean up pg_upgrade_output.d/ as an extra step for any repeated
calls of pg_upgrade. The most common scenario here is --check followed
by the actual upgrade, but one could see a failure when specifying an
incorrect input argument value. Removing entirely the logs would have
the disadvantage of removing all the past information, even if --retain
was specified at some past step.
This result is annoying for a lot of users and automated upgrade flows.
So, rather than requiring a manual removal of pg_upgrade_output.d/, this
redesigns the set of output directories in a more dynamic way, based on
a suggestion from Tom Lane and Daniel Gustafsson. pg_upgrade_output.d/
is still the base path, but a second directory level is added, mostly
named after an ISO-8601-formatted timestamp (in short human-readable,
with milliseconds appended to the name to avoid any conflicts). The
logs and dumps are saved within the same subdirectories as previously,
as of log/ and dump/, but these are located inside the subdirectory
named after the timestamp.
The logs of a given run are removed only after a successful run if
--retain is not used, and pg_upgrade_output.d/ is kept if there are any
logs from a previous run. Note that previously, pg_upgrade would have
kept the logs even after a successful --check but that was inconsistent
compared to the case without --check when using --retain. The code in
charge of the removal of the output directories is now refactored into a
single routine.
Two TAP tests are added with some --check commands (one failure case and
one success case), to look after the issue fixed here. Note that the
tests had to be tweaked a bit to fit with the new directory structure so
as it can find any logs generated on failure. This is still going to
require a change in the buildfarm client for the case where pg_upgrade
is tested without the TAP test, though, but I'll tackle that with a
separate patch where needed.
Reported-by: Tushar Ahuja
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Justin Pryzby
Discussion: https://postgr.es/m/77e6ecaa-2785-97aa-f229-4b6e047cbd2b@enterprisedb.com
2022-06-08 03:53:01 +02:00
|
|
|
* Base directories that include all the files generated internally, from the
|
|
|
|
* root path of the new cluster. The paths are dynamically built as of
|
|
|
|
* BASE_OUTPUTDIR/$timestamp/{LOG_OUTPUTDIR,DUMP_OUTPUTDIR} to ensure their
|
|
|
|
* uniqueness in each run.
|
pg_upgrade: Move all the files generated internally to a subdirectory
Historically, the location of any files generated by pg_upgrade, as of
the per-database logs and internal dumps, has been the current working
directory, leaving all those files behind when using --retain or on a
failure.
Putting all those contents in a targeted subdirectory makes the whole
easier to debug, and simplifies the code in charge of cleaning up the
logs. Note that another reason is that this facilitates the move of
pg_upgrade to TAP with a fixed location for all the logs to grab if the
test fails repeatedly.
Initially, we thought about being able to specify the output directory
with a new option, but we have settled on using a subdirectory located
at the root of the new cluster's data folder, "pg_upgrade_output.d",
instead, as at the end the new data directory is the location of all the
data generated by pg_upgrade. There is a take with group permissions
here though: if the new data folder has been initialized with this
option, we need to create all the files and paths with the correct
permissions or a base backup taken after a pg_upgrade --retain would
fail, meaning that GetDataDirectoryCreatePerm() has to be called before
creating the log paths, before a couple of sanity checks on the clusters
and before getting the socket directory for the cluster's host settings.
The idea of the new location is based on a suggestion from Peter
Eisentraut.
Also thanks to Andrew Dunstan, Peter Eisentraut, Daniel Gustafsson, Tom
Lane and Bruce Momjian for the discussion (in alphabetical order).
Author: Justin Pryzby
Discussion: https://postgr.es/m/20211212025017.GN17618@telsasoft.com
2022-02-06 04:27:29 +01:00
|
|
|
*/
|
|
|
|
#define BASE_OUTPUTDIR "pg_upgrade_output.d"
|
Restructure pg_upgrade output directories for better idempotence
38bfae3 has moved the contents written to files by pg_upgrade under a
new directory called pg_upgrade_output.d/ located in the new cluster's
data folder, and it used a simple structure made of two subdirectories
leading to a fixed structure: log/ and dump/. This design has made
weaker pg_upgrade on repeated calls, as we could get failures when
creating one or more of those directories, while potentially losing the
logs of a previous run (logs are retained automatically on failure, and
cleaned up on success unless --retain is specified). So a user would
need to clean up pg_upgrade_output.d/ as an extra step for any repeated
calls of pg_upgrade. The most common scenario here is --check followed
by the actual upgrade, but one could see a failure when specifying an
incorrect input argument value. Removing entirely the logs would have
the disadvantage of removing all the past information, even if --retain
was specified at some past step.
This result is annoying for a lot of users and automated upgrade flows.
So, rather than requiring a manual removal of pg_upgrade_output.d/, this
redesigns the set of output directories in a more dynamic way, based on
a suggestion from Tom Lane and Daniel Gustafsson. pg_upgrade_output.d/
is still the base path, but a second directory level is added, mostly
named after an ISO-8601-formatted timestamp (in short human-readable,
with milliseconds appended to the name to avoid any conflicts). The
logs and dumps are saved within the same subdirectories as previously,
as of log/ and dump/, but these are located inside the subdirectory
named after the timestamp.
The logs of a given run are removed only after a successful run if
--retain is not used, and pg_upgrade_output.d/ is kept if there are any
logs from a previous run. Note that previously, pg_upgrade would have
kept the logs even after a successful --check but that was inconsistent
compared to the case without --check when using --retain. The code in
charge of the removal of the output directories is now refactored into a
single routine.
Two TAP tests are added with some --check commands (one failure case and
one success case), to look after the issue fixed here. Note that the
tests had to be tweaked a bit to fit with the new directory structure so
as it can find any logs generated on failure. This is still going to
require a change in the buildfarm client for the case where pg_upgrade
is tested without the TAP test, though, but I'll tackle that with a
separate patch where needed.
Reported-by: Tushar Ahuja
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Justin Pryzby
Discussion: https://postgr.es/m/77e6ecaa-2785-97aa-f229-4b6e047cbd2b@enterprisedb.com
2022-06-08 03:53:01 +02:00
|
|
|
#define LOG_OUTPUTDIR "log"
|
|
|
|
#define DUMP_OUTPUTDIR "dump"
|
pg_upgrade: Move all the files generated internally to a subdirectory
Historically, the location of any files generated by pg_upgrade, as of
the per-database logs and internal dumps, has been the current working
directory, leaving all those files behind when using --retain or on a
failure.
Putting all those contents in a targeted subdirectory makes the whole
easier to debug, and simplifies the code in charge of cleaning up the
logs. Note that another reason is that this facilitates the move of
pg_upgrade to TAP with a fixed location for all the logs to grab if the
test fails repeatedly.
Initially, we thought about being able to specify the output directory
with a new option, but we have settled on using a subdirectory located
at the root of the new cluster's data folder, "pg_upgrade_output.d",
instead, as at the end the new data directory is the location of all the
data generated by pg_upgrade. There is a take with group permissions
here though: if the new data folder has been initialized with this
option, we need to create all the files and paths with the correct
permissions or a base backup taken after a pg_upgrade --retain would
fail, meaning that GetDataDirectoryCreatePerm() has to be called before
creating the log paths, before a couple of sanity checks on the clusters
and before getting the socket directory for the cluster's host settings.
The idea of the new location is based on a suggestion from Peter
Eisentraut.
Also thanks to Andrew Dunstan, Peter Eisentraut, Daniel Gustafsson, Tom
Lane and Bruce Momjian for the discussion (in alphabetical order).
Author: Justin Pryzby
Discussion: https://postgr.es/m/20211212025017.GN17618@telsasoft.com
2022-02-06 04:27:29 +01:00
|
|
|
|
2012-12-27 01:26:30 +01:00
|
|
|
#define DB_DUMP_LOG_FILE_MASK "pg_upgrade_dump_%u.log"
|
2012-03-13 00:47:54 +01:00
|
|
|
#define SERVER_LOG_FILE "pg_upgrade_server.log"
|
|
|
|
#define UTILITY_LOG_FILE "pg_upgrade_utility.log"
|
|
|
|
#define INTERNAL_LOG_FILE "pg_upgrade_internal.log"
|
|
|
|
|
|
|
|
extern char *output_files[];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* WIN32 files do not accept writes from multiple processes
|
|
|
|
*
|
|
|
|
* On Win32, we can't send both pg_upgrade output and command output to the
|
|
|
|
* same file because we get the error: "The process cannot access the file
|
|
|
|
* because it is being used by another process." so send the pg_ctl
|
2012-05-24 02:19:21 +02:00
|
|
|
* command-line output to a new file, rather than into the server log file.
|
|
|
|
* Ideally we could use UTILITY_LOG_FILE for this, but some Windows platforms
|
2012-05-25 02:30:39 +02:00
|
|
|
* keep the pg_ctl output file open by the running postmaster, even after
|
|
|
|
* pg_ctl exits.
|
2012-03-13 00:47:54 +01:00
|
|
|
*
|
|
|
|
* We could use the Windows pgwin32_open() flags to allow shared file
|
|
|
|
* writes but is unclear how all other tools would use those flags, so
|
|
|
|
* we just avoid it and log a little differently on Windows; we adjust
|
|
|
|
* the error message appropriately.
|
|
|
|
*/
|
|
|
|
#ifndef WIN32
|
2012-05-25 02:30:39 +02:00
|
|
|
#define SERVER_START_LOG_FILE SERVER_LOG_FILE
|
|
|
|
#define SERVER_STOP_LOG_FILE SERVER_LOG_FILE
|
2012-03-13 00:47:54 +01:00
|
|
|
#else
|
2012-05-25 02:30:39 +02:00
|
|
|
#define SERVER_START_LOG_FILE "pg_upgrade_server_start.log"
|
2012-09-05 06:01:13 +02:00
|
|
|
/*
|
|
|
|
* "pg_ctl start" keeps SERVER_START_LOG_FILE and SERVER_LOG_FILE open
|
|
|
|
* while the server is running, so we use UTILITY_LOG_FILE for "pg_ctl
|
|
|
|
* stop".
|
|
|
|
*/
|
2012-05-25 15:09:51 +02:00
|
|
|
#define SERVER_STOP_LOG_FILE UTILITY_LOG_FILE
|
2012-03-13 00:47:54 +01:00
|
|
|
#endif
|
|
|
|
|
2012-03-17 14:56:47 +01:00
|
|
|
|
2010-05-12 04:19:11 +02:00
|
|
|
#ifndef WIN32
|
|
|
|
#define pg_mv_file rename
|
2012-09-04 00:06:47 +02:00
|
|
|
#define PATH_SEPARATOR '/'
|
2015-05-11 18:57:48 +02:00
|
|
|
#define PATH_QUOTE '\''
|
2010-10-19 17:52:43 +02:00
|
|
|
#define RM_CMD "rm -f"
|
2010-05-12 04:19:11 +02:00
|
|
|
#define RMDIR_CMD "rm -rf"
|
2014-10-12 00:38:41 +02:00
|
|
|
#define SCRIPT_PREFIX "./"
|
2010-10-19 17:57:55 +02:00
|
|
|
#define SCRIPT_EXT "sh"
|
2012-03-17 14:56:47 +01:00
|
|
|
#define ECHO_QUOTE "'"
|
2012-09-04 11:49:22 +02:00
|
|
|
#define ECHO_BLANK ""
|
2010-05-12 04:19:11 +02:00
|
|
|
#else
|
|
|
|
#define pg_mv_file pgrename
|
2012-09-04 00:06:47 +02:00
|
|
|
#define PATH_SEPARATOR '\\'
|
2015-05-11 18:57:48 +02:00
|
|
|
#define PATH_QUOTE '"'
|
2021-07-27 17:15:38 +02:00
|
|
|
/* @ prefix disables command echo in .bat files */
|
|
|
|
#define RM_CMD "@DEL /q"
|
|
|
|
#define RMDIR_CMD "@RMDIR /s/q"
|
2014-10-12 00:38:41 +02:00
|
|
|
#define SCRIPT_PREFIX ""
|
2010-10-19 17:57:55 +02:00
|
|
|
#define SCRIPT_EXT "bat"
|
2010-07-01 17:52:52 +02:00
|
|
|
#define EXE_EXT ".exe"
|
2012-03-17 14:56:47 +01:00
|
|
|
#define ECHO_QUOTE ""
|
2012-09-04 11:49:22 +02:00
|
|
|
#define ECHO_BLANK "."
|
2010-05-12 04:19:11 +02:00
|
|
|
#endif
|
|
|
|
|
|
|
|
|
2017-08-25 18:02:29 +02:00
|
|
|
/*
|
2021-12-15 01:17:55 +01:00
|
|
|
* The format of visibility map was changed with this 9.6 commit.
|
pg_upgrade: Convert old visibility map format to new format.
Commit a892234f830e832110f63fc0a2afce2fb21d1584 added a second bit per
page to the visibility map, but pg_upgrade has been unaware of it up
until now. Therefore, a pg_upgrade from an earlier major release of
PostgreSQL to any commit preceding this one and following the one
mentioned above would result in invalid visibility map contents on the
new cluster, very possibly leading to data corruption. This plugs
that hole.
Masahiko Sawada, reviewed by Jeff Janes, Bruce Momjian, Simon Riggs,
Michael Paquier, Andres Freund, me, and others.
2016-03-11 18:28:22 +01:00
|
|
|
*/
|
|
|
|
#define VISIBILITY_MAP_FROZEN_BIT_CAT_VER 201603011
|
2017-08-25 18:02:29 +02:00
|
|
|
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
/*
|
2013-02-28 20:29:17 +01:00
|
|
|
* pg_multixact format changed in 9.3 commit 0ac5ad5134f2769ccbaefec73844f85,
|
|
|
|
* ("Improve concurrency of foreign key locking") which also updated catalog
|
|
|
|
* version to this value. pg_upgrade behavior depends on whether old and new
|
|
|
|
* server versions are both newer than this, or only the new one is.
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
*/
|
|
|
|
#define MULTIXACT_FORMATCHANGE_CAT_VER 201301231
|
2010-05-12 04:19:11 +02:00
|
|
|
|
2014-09-11 01:23:36 +02:00
|
|
|
/*
|
|
|
|
* large object chunk size added to pg_controldata,
|
|
|
|
* commit 5f93c37805e7485488480916b4585e098d3cc883
|
|
|
|
*/
|
|
|
|
#define LARGE_OBJECT_SIZE_PG_CONTROL_VER 942
|
|
|
|
|
2014-09-30 02:19:59 +02:00
|
|
|
/*
|
|
|
|
* change in JSONB format during 9.4 beta
|
|
|
|
*/
|
|
|
|
#define JSONB_FORMAT_CHANGE_CAT_VER 201409291
|
|
|
|
|
2017-08-25 18:02:29 +02:00
|
|
|
|
2010-05-12 04:19:11 +02:00
|
|
|
/*
|
|
|
|
* Each relation is represented by a relinfo structure.
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
Improve pg_upgrade's report about failure to match up old and new tables.
Ordinarily, pg_upgrade shouldn't have any difficulty in matching up all
the relations it sees in the old and new databases. If it does, however,
it just goes belly-up with a pretty unhelpful error message. That seemed
fine as long as we expected the case never to occur in the wild, but
Alvaro reported that it had been seen in a database whose pg_largeobject
table had somehow acquired a TOAST table. That doesn't quite seem like
a case that pg_upgrade actually needs to handle, but it would be good if
the report were more diagnosable. Hence, extend the logic to print out
as much information as we can about the mismatch(es) before we quit.
In passing, improve the readability of get_rel_infos()'s data collection
query, which had suffered seriously from lets-not-bother-to-update-comments
syndrome, and generally was unnecessarily disrespectful to readers.
It could be argued that this is a bug fix, but given that we have so few
reports, I don't feel a need to back-patch; at least not before this has
baked awhile in HEAD.
2016-05-06 20:23:45 +02:00
|
|
|
/* Can't use NAMEDATALEN; not guaranteed to be same on client */
|
2012-12-20 19:56:24 +01:00
|
|
|
char *nspname; /* namespace name */
|
|
|
|
char *relname; /* relation name */
|
Improve pg_upgrade's report about failure to match up old and new tables.
Ordinarily, pg_upgrade shouldn't have any difficulty in matching up all
the relations it sees in the old and new databases. If it does, however,
it just goes belly-up with a pretty unhelpful error message. That seemed
fine as long as we expected the case never to occur in the wild, but
Alvaro reported that it had been seen in a database whose pg_largeobject
table had somehow acquired a TOAST table. That doesn't quite seem like
a case that pg_upgrade actually needs to handle, but it would be good if
the report were more diagnosable. Hence, extend the logic to print out
as much information as we can about the mismatch(es) before we quit.
In passing, improve the readability of get_rel_infos()'s data collection
query, which had suffered seriously from lets-not-bother-to-update-comments
syndrome, and generally was unnecessarily disrespectful to readers.
It could be argued that this is a bug fix, but given that we have so few
reports, I don't feel a need to back-patch; at least not before this has
baked awhile in HEAD.
2016-05-06 20:23:45 +02:00
|
|
|
Oid reloid; /* relation OID */
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
RelFileNumber relfilenumber; /* relation file number */
|
Improve pg_upgrade's report about failure to match up old and new tables.
Ordinarily, pg_upgrade shouldn't have any difficulty in matching up all
the relations it sees in the old and new databases. If it does, however,
it just goes belly-up with a pretty unhelpful error message. That seemed
fine as long as we expected the case never to occur in the wild, but
Alvaro reported that it had been seen in a database whose pg_largeobject
table had somehow acquired a TOAST table. That doesn't quite seem like
a case that pg_upgrade actually needs to handle, but it would be good if
the report were more diagnosable. Hence, extend the logic to print out
as much information as we can about the mismatch(es) before we quit.
In passing, improve the readability of get_rel_infos()'s data collection
query, which had suffered seriously from lets-not-bother-to-update-comments
syndrome, and generally was unnecessarily disrespectful to readers.
It could be argued that this is a bug fix, but given that we have so few
reports, I don't feel a need to back-patch; at least not before this has
baked awhile in HEAD.
2016-05-06 20:23:45 +02:00
|
|
|
Oid indtable; /* if index, OID of its table, else 0 */
|
|
|
|
Oid toastheap; /* if toast table, OID of base table, else 0 */
|
|
|
|
char *tablespace; /* tablespace path; "" for cluster default */
|
|
|
|
bool nsp_alloc; /* should nspname be freed? */
|
|
|
|
bool tblsp_alloc; /* should tablespace be freed? */
|
2010-05-12 04:19:11 +02:00
|
|
|
} RelInfo;
|
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
RelInfo *rels;
|
|
|
|
int nrels;
|
|
|
|
} RelInfoArr;
|
|
|
|
|
Migrate logical slots to the new node during an upgrade.
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
2023-10-26 03:24:16 +02:00
|
|
|
/*
|
|
|
|
* Structure to store logical replication slot information.
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
char *slotname; /* slot name */
|
|
|
|
char *plugin; /* plugin */
|
|
|
|
bool two_phase; /* can the slot decode 2PC? */
|
|
|
|
bool caught_up; /* has the slot caught up to latest changes? */
|
|
|
|
bool invalid; /* if true, the slot is unusable */
|
Allow to enable failover property for replication slots via SQL API.
This commit adds the failover property to the replication slot. The
failover property indicates whether the slot will be synced to the standby
servers, enabling the resumption of corresponding logical replication
after failover. But note that this commit does not yet include the
capability to sync the replication slot; the subsequent commits will add
that capability.
A new optional parameter 'failover' is added to the
pg_create_logical_replication_slot() function. We will also enable to set
'failover' option for slots via the subscription commands in the
subsequent commits.
The value of the 'failover' flag is displayed as part of
pg_replication_slots view.
Author: Hou Zhijie, Shveta Malik, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-25 07:45:46 +01:00
|
|
|
bool failover; /* is the slot designated to be synced to the
|
|
|
|
* physical standby? */
|
Migrate logical slots to the new node during an upgrade.
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
2023-10-26 03:24:16 +02:00
|
|
|
} LogicalSlotInfo;
|
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
int nslots; /* number of logical slot infos */
|
|
|
|
LogicalSlotInfo *slots; /* array of logical slot infos */
|
|
|
|
} LogicalSlotInfoArr;
|
|
|
|
|
2010-05-12 04:19:11 +02:00
|
|
|
/*
|
|
|
|
* The following structure represents a relation mapping.
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
2014-02-12 22:35:24 +01:00
|
|
|
const char *old_tablespace;
|
|
|
|
const char *new_tablespace;
|
|
|
|
const char *old_tablespace_suffix;
|
|
|
|
const char *new_tablespace_suffix;
|
pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba005256a19991203583a94b4f9a21a9 arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.
One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.
To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.
This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 20:23:15 +01:00
|
|
|
Oid db_oid;
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
RelFileNumber relfilenumber;
|
2011-01-05 17:37:08 +01:00
|
|
|
/* the rest are used only for logging and error reporting */
|
2012-12-20 19:56:24 +01:00
|
|
|
char *nspname; /* namespaces */
|
|
|
|
char *relname;
|
2010-05-12 04:19:11 +02:00
|
|
|
} FileNameMap;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Structure to store database information
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
Oid db_oid; /* oid of the database */
|
2012-12-20 19:56:24 +01:00
|
|
|
char *db_name; /* database name */
|
2014-02-12 22:35:24 +01:00
|
|
|
char db_tablespace[MAXPGPATH]; /* database default tablespace
|
|
|
|
* path */
|
2023-03-09 17:28:05 +01:00
|
|
|
RelInfoArr rel_arr; /* array of all user relinfos */
|
Migrate logical slots to the new node during an upgrade.
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
2023-10-26 03:24:16 +02:00
|
|
|
LogicalSlotInfoArr slot_arr; /* array of all LogicalSlotInfo */
|
Allow upgrades to preserve the full subscription's state.
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.
Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker). Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.
To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.
The subscription's replication origin is needed to ensure that we don't
replicate anything twice.
To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.
pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.
Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 03:38:46 +01:00
|
|
|
int nsubs; /* number of subscriptions */
|
2023-03-09 17:28:05 +01:00
|
|
|
} DbInfo;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Locale information about a database.
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
Change the way encoding and locale checks are done in pg_upgrade.
Lc_collate and lc_ctype have been per-database settings since server version
8.4, but pg_upgrade was still treating them as cluster-wide options. It
fetched the values for the template0 databases in old and new cluster, and
compared them. That's backwards; the encoding and locale of the template0
database doesn't matter, as template0 is guaranteed to contain only ASCII
characters. But if there are any other databases that exist on both clusters
(in particular template1 and postgres databases), their encodings and
locales must be compatible.
Also, make the locale comparison more lenient. If the locale names are not
equal, try to canonicalize both of them by passing them to setlocale(). We
used to do that only when upgrading from 9.1 or below, but it seems like a
good idea even with newer versions. If we change the canonical form of a
locale, this allows pg_upgrade to still work. I'm about to do just that to
fix bug #11431, by mapping a locale name that contains non-ASCII characters
to a pure-ASCII alias of the same locale.
No backpatching, because earlier versions of pg_upgrade still support
upgrading from 8.3 servers. That would be more complicated, so it doesn't
seem worth it, given that we haven't received any complaints about this
from users.
2014-10-10 08:59:44 +02:00
|
|
|
char *db_collate;
|
|
|
|
char *db_ctype;
|
2022-03-17 11:11:21 +01:00
|
|
|
char db_collprovider;
|
|
|
|
char *db_iculocale;
|
Change the way encoding and locale checks are done in pg_upgrade.
Lc_collate and lc_ctype have been per-database settings since server version
8.4, but pg_upgrade was still treating them as cluster-wide options. It
fetched the values for the template0 databases in old and new cluster, and
compared them. That's backwards; the encoding and locale of the template0
database doesn't matter, as template0 is guaranteed to contain only ASCII
characters. But if there are any other databases that exist on both clusters
(in particular template1 and postgres databases), their encodings and
locales must be compatible.
Also, make the locale comparison more lenient. If the locale names are not
equal, try to canonicalize both of them by passing them to setlocale(). We
used to do that only when upgrading from 9.1 or below, but it seems like a
good idea even with newer versions. If we change the canonical form of a
locale, this allows pg_upgrade to still work. I'm about to do just that to
fix bug #11431, by mapping a locale name that contains non-ASCII characters
to a pure-ASCII alias of the same locale.
No backpatching, because earlier versions of pg_upgrade still support
upgrading from 8.3 servers. That would be more complicated, so it doesn't
seem worth it, given that we haven't received any complaints about this
from users.
2014-10-10 08:59:44 +02:00
|
|
|
int db_encoding;
|
2023-03-09 17:28:05 +01:00
|
|
|
} DbLocaleInfo;
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
DbInfo *dbs; /* array of db infos */
|
|
|
|
int ndbs; /* number of db infos */
|
|
|
|
} DbInfoArr;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The following structure is used to hold pg_control information.
|
|
|
|
* Rather than using the backend's control structure we use our own
|
|
|
|
* structure to avoid pg_control version issues between releases.
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
uint32 ctrl_ver;
|
|
|
|
uint32 cat_ver;
|
2012-06-26 06:35:57 +02:00
|
|
|
char nextxlogfile[25];
|
2010-05-12 04:19:11 +02:00
|
|
|
uint32 chkpnt_nxtxid;
|
2014-09-06 01:19:41 +02:00
|
|
|
uint32 chkpnt_nxtepoch;
|
2010-05-12 04:19:11 +02:00
|
|
|
uint32 chkpnt_nxtoid;
|
Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 16:04:59 +01:00
|
|
|
uint32 chkpnt_nxtmulti;
|
|
|
|
uint32 chkpnt_nxtmxoff;
|
|
|
|
uint32 chkpnt_oldstMulti;
|
2021-07-27 04:38:15 +02:00
|
|
|
uint32 chkpnt_oldstxid;
|
2010-05-12 04:19:11 +02:00
|
|
|
uint32 align;
|
|
|
|
uint32 blocksz;
|
|
|
|
uint32 largesz;
|
|
|
|
uint32 walsz;
|
|
|
|
uint32 walseg;
|
|
|
|
uint32 ident;
|
|
|
|
uint32 index;
|
|
|
|
uint32 toast;
|
2014-09-11 01:23:36 +02:00
|
|
|
uint32 large_object;
|
2010-05-12 04:19:11 +02:00
|
|
|
bool date_is_int;
|
|
|
|
bool float8_pass_by_value;
|
2022-08-13 00:00:41 +02:00
|
|
|
uint32 data_checksum_version;
|
2010-05-12 04:19:11 +02:00
|
|
|
} ControlData;
|
|
|
|
|
|
|
|
/*
|
2018-11-07 18:05:54 +01:00
|
|
|
* Enumeration to denote transfer modes
|
2010-05-12 04:19:11 +02:00
|
|
|
*/
|
|
|
|
typedef enum
|
|
|
|
{
|
2018-11-07 18:05:54 +01:00
|
|
|
TRANSFER_MODE_CLONE,
|
2010-05-12 04:19:11 +02:00
|
|
|
TRANSFER_MODE_COPY,
|
2024-03-05 23:39:50 +01:00
|
|
|
TRANSFER_MODE_COPY_FILE_RANGE,
|
2010-05-12 04:19:11 +02:00
|
|
|
TRANSFER_MODE_LINK,
|
|
|
|
} transferMode;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Enumeration to denote pg_log modes
|
|
|
|
*/
|
|
|
|
typedef enum
|
|
|
|
{
|
2012-03-13 00:47:54 +01:00
|
|
|
PG_VERBOSE,
|
Remove trailing newlines in pg_upgrade's message strings.
pg_upgrade does not use common/logging.c, which is unfortunate
but changing it to do so seems like more work than is justified.
However, we really need to make it work more like common/logging.c
in one respect: the latter expects supplied message strings to not
end with a newline, instead adding one internally. As it stands,
pg_upgrade's logging facilities expect a caller-supplied newline
in some cases and not others, which is already an invitation to bugs,
but the inconsistency with our other frontend code makes it worse.
There are already several places with missing or extra newlines,
and it's inevitable that there won't be more if we let this stand.
Hence, run around and get rid of all trailing newlines in message
strings, and add an Assert that there's not one, similar to the
existing Assert in common/logging.c. Adjust the logging functions
to supply a newline at the right places.
(Some of these strings also have a *leading* newline, which would
be a good thing to get rid of too; but this patch doesn't attempt
that.)
There are some consequent minor changes in output. The ones that
aren't outright bug fixes are generally removal of extra blank
lines that the original coding intentionally inserted. It didn't
seem worth being bug-compatible with that.
Patch by me, reviewed by Kyotaro Horiguchi and Peter Eisentraut
Discussion: https://postgr.es/m/113191.1655233060@sss.pgh.pa.us
2022-07-12 21:17:44 +02:00
|
|
|
PG_STATUS, /* these messages do not get a newline added */
|
|
|
|
PG_REPORT_NONL, /* these too */
|
2010-05-12 04:19:11 +02:00
|
|
|
PG_REPORT,
|
|
|
|
PG_WARNING,
|
2012-03-13 00:47:54 +01:00
|
|
|
PG_FATAL,
|
2010-05-12 04:19:11 +02:00
|
|
|
} eLogType;
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* cluster
|
|
|
|
*
|
|
|
|
* information about each cluster
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
ControlData controldata; /* pg_control information */
|
2023-03-09 17:28:05 +01:00
|
|
|
DbLocaleInfo *template0; /* template0 locale info */
|
2010-05-12 04:19:11 +02:00
|
|
|
DbInfoArr dbarr; /* dbinfos array */
|
|
|
|
char *pgdata; /* pathname for cluster's $PGDATA directory */
|
2011-10-07 20:40:23 +02:00
|
|
|
char *pgconfig; /* pathname for cluster's config file
|
|
|
|
* directory */
|
2010-05-12 04:19:11 +02:00
|
|
|
char *bindir; /* pathname for cluster's executable directory */
|
2011-10-10 13:43:28 +02:00
|
|
|
char *pgopts; /* options to pass to the server, like pg_ctl
|
|
|
|
* -o */
|
2012-09-03 19:52:34 +02:00
|
|
|
char *sockdir; /* directory for Unix Domain socket, if any */
|
2010-05-12 04:19:11 +02:00
|
|
|
unsigned short port; /* port number where postmaster is waiting */
|
|
|
|
uint32 major_version; /* PG_VERSION of cluster */
|
2011-01-01 18:28:48 +01:00
|
|
|
char major_version_str[64]; /* string PG_VERSION of cluster */
|
2011-06-23 02:48:34 +02:00
|
|
|
uint32 bin_version; /* version returned from pg_ctl */
|
2014-02-12 22:35:24 +01:00
|
|
|
const char *tablespace_suffix; /* directory specification */
|
2010-05-12 04:19:11 +02:00
|
|
|
} ClusterInfo;
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2010-10-20 04:31:17 +02:00
|
|
|
* LogOpts
|
2010-10-19 23:38:16 +02:00
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
2012-03-13 00:47:54 +01:00
|
|
|
FILE *internal; /* internal log FILE */
|
2017-08-16 06:22:32 +02:00
|
|
|
bool verbose; /* true -> be verbose in messages */
|
2012-03-13 00:47:54 +01:00
|
|
|
bool retain; /* retain log files on success */
|
pg_upgrade: Move all the files generated internally to a subdirectory
Historically, the location of any files generated by pg_upgrade, as of
the per-database logs and internal dumps, has been the current working
directory, leaving all those files behind when using --retain or on a
failure.
Putting all those contents in a targeted subdirectory makes the whole
easier to debug, and simplifies the code in charge of cleaning up the
logs. Note that another reason is that this facilitates the move of
pg_upgrade to TAP with a fixed location for all the logs to grab if the
test fails repeatedly.
Initially, we thought about being able to specify the output directory
with a new option, but we have settled on using a subdirectory located
at the root of the new cluster's data folder, "pg_upgrade_output.d",
instead, as at the end the new data directory is the location of all the
data generated by pg_upgrade. There is a take with group permissions
here though: if the new data folder has been initialized with this
option, we need to create all the files and paths with the correct
permissions or a base backup taken after a pg_upgrade --retain would
fail, meaning that GetDataDirectoryCreatePerm() has to be called before
creating the log paths, before a couple of sanity checks on the clusters
and before getting the socket directory for the cluster's host settings.
The idea of the new location is based on a suggestion from Peter
Eisentraut.
Also thanks to Andrew Dunstan, Peter Eisentraut, Daniel Gustafsson, Tom
Lane and Bruce Momjian for the discussion (in alphabetical order).
Author: Justin Pryzby
Discussion: https://postgr.es/m/20211212025017.GN17618@telsasoft.com
2022-02-06 04:27:29 +01:00
|
|
|
/* Set of internal directories for output files */
|
Restructure pg_upgrade output directories for better idempotence
38bfae3 has moved the contents written to files by pg_upgrade under a
new directory called pg_upgrade_output.d/ located in the new cluster's
data folder, and it used a simple structure made of two subdirectories
leading to a fixed structure: log/ and dump/. This design has made
weaker pg_upgrade on repeated calls, as we could get failures when
creating one or more of those directories, while potentially losing the
logs of a previous run (logs are retained automatically on failure, and
cleaned up on success unless --retain is specified). So a user would
need to clean up pg_upgrade_output.d/ as an extra step for any repeated
calls of pg_upgrade. The most common scenario here is --check followed
by the actual upgrade, but one could see a failure when specifying an
incorrect input argument value. Removing entirely the logs would have
the disadvantage of removing all the past information, even if --retain
was specified at some past step.
This result is annoying for a lot of users and automated upgrade flows.
So, rather than requiring a manual removal of pg_upgrade_output.d/, this
redesigns the set of output directories in a more dynamic way, based on
a suggestion from Tom Lane and Daniel Gustafsson. pg_upgrade_output.d/
is still the base path, but a second directory level is added, mostly
named after an ISO-8601-formatted timestamp (in short human-readable,
with milliseconds appended to the name to avoid any conflicts). The
logs and dumps are saved within the same subdirectories as previously,
as of log/ and dump/, but these are located inside the subdirectory
named after the timestamp.
The logs of a given run are removed only after a successful run if
--retain is not used, and pg_upgrade_output.d/ is kept if there are any
logs from a previous run. Note that previously, pg_upgrade would have
kept the logs even after a successful --check but that was inconsistent
compared to the case without --check when using --retain. The code in
charge of the removal of the output directories is now refactored into a
single routine.
Two TAP tests are added with some --check commands (one failure case and
one success case), to look after the issue fixed here. Note that the
tests had to be tweaked a bit to fit with the new directory structure so
as it can find any logs generated on failure. This is still going to
require a change in the buildfarm client for the case where pg_upgrade
is tested without the TAP test, though, but I'll tackle that with a
separate patch where needed.
Reported-by: Tushar Ahuja
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Justin Pryzby
Discussion: https://postgr.es/m/77e6ecaa-2785-97aa-f229-4b6e047cbd2b@enterprisedb.com
2022-06-08 03:53:01 +02:00
|
|
|
char *rootdir; /* Root directory, aka pg_upgrade_output.d */
|
|
|
|
char *basedir; /* Base output directory, with timestamp */
|
pg_upgrade: Move all the files generated internally to a subdirectory
Historically, the location of any files generated by pg_upgrade, as of
the per-database logs and internal dumps, has been the current working
directory, leaving all those files behind when using --retain or on a
failure.
Putting all those contents in a targeted subdirectory makes the whole
easier to debug, and simplifies the code in charge of cleaning up the
logs. Note that another reason is that this facilitates the move of
pg_upgrade to TAP with a fixed location for all the logs to grab if the
test fails repeatedly.
Initially, we thought about being able to specify the output directory
with a new option, but we have settled on using a subdirectory located
at the root of the new cluster's data folder, "pg_upgrade_output.d",
instead, as at the end the new data directory is the location of all the
data generated by pg_upgrade. There is a take with group permissions
here though: if the new data folder has been initialized with this
option, we need to create all the files and paths with the correct
permissions or a base backup taken after a pg_upgrade --retain would
fail, meaning that GetDataDirectoryCreatePerm() has to be called before
creating the log paths, before a couple of sanity checks on the clusters
and before getting the socket directory for the cluster's host settings.
The idea of the new location is based on a suggestion from Peter
Eisentraut.
Also thanks to Andrew Dunstan, Peter Eisentraut, Daniel Gustafsson, Tom
Lane and Bruce Momjian for the discussion (in alphabetical order).
Author: Justin Pryzby
Discussion: https://postgr.es/m/20211212025017.GN17618@telsasoft.com
2022-02-06 04:27:29 +01:00
|
|
|
char *dumpdir; /* Dumps */
|
|
|
|
char *logdir; /* Log files */
|
2022-02-21 17:34:59 +01:00
|
|
|
bool isatty; /* is stdout a tty */
|
2010-10-20 04:31:17 +02:00
|
|
|
} LogOpts;
|
2010-10-19 23:38:16 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* UserOpts
|
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
2017-08-16 06:22:32 +02:00
|
|
|
bool check; /* true -> ask user for permission to make
|
2010-10-19 23:38:16 +02:00
|
|
|
* changes */
|
2021-12-18 08:18:45 +01:00
|
|
|
bool do_sync; /* flush changes to disk */
|
2010-10-19 23:38:16 +02:00
|
|
|
transferMode transfer_mode; /* copy files or link them? */
|
2018-12-01 21:45:11 +01:00
|
|
|
int jobs; /* number of processes/threads to use */
|
|
|
|
char *socketdir; /* directory to use for Unix sockets */
|
Allow using syncfs() in frontend utilities.
This commit allows specifying a --sync-method in several frontend
utilities that must synchronize many files to disk (initdb,
pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade).
On Linux, users can specify "syncfs" to synchronize the relevant
file systems instead of calling fsync() for every single file. In
many cases, using syncfs() is much faster.
As with recovery_init_sync_method, this new option comes with some
caveats. The descriptions of these caveats have been moved to a
new appendix section in the documentation.
Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier, Thomas Munro, Robert Haas, Justin Pryzby
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
2023-09-07 01:27:16 +02:00
|
|
|
char *sync_method;
|
2010-10-19 23:38:16 +02:00
|
|
|
} UserOpts;
|
|
|
|
|
2018-07-28 18:33:54 +02:00
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
char *name;
|
|
|
|
int dbnum;
|
|
|
|
} LibraryInfo;
|
2010-10-19 23:38:16 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* OSInfo
|
2010-05-12 04:19:11 +02:00
|
|
|
*/
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
const char *progname; /* complete pathname for this program */
|
|
|
|
char *user; /* username for clusters */
|
2013-06-29 01:11:51 +02:00
|
|
|
bool user_specified; /* user specified on command-line */
|
2013-01-09 14:57:47 +01:00
|
|
|
char **old_tablespaces; /* tablespaces */
|
|
|
|
int num_old_tablespaces;
|
2018-07-28 18:33:54 +02:00
|
|
|
LibraryInfo *libraries; /* loadable libraries */
|
2010-05-12 04:19:11 +02:00
|
|
|
int num_libraries;
|
2011-01-01 18:06:36 +01:00
|
|
|
ClusterInfo *running_cluster;
|
2010-10-19 23:38:16 +02:00
|
|
|
} OSInfo;
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Global variables
|
|
|
|
*/
|
2010-10-20 04:31:17 +02:00
|
|
|
extern LogOpts log_opts;
|
2010-10-20 00:37:04 +02:00
|
|
|
extern UserOpts user_opts;
|
2011-01-01 18:06:36 +01:00
|
|
|
extern ClusterInfo old_cluster,
|
|
|
|
new_cluster;
|
2010-10-20 00:37:04 +02:00
|
|
|
extern OSInfo os_info;
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* check.c */
|
|
|
|
|
2013-01-24 21:20:11 +01:00
|
|
|
void output_check_banner(bool live_check);
|
2014-08-26 02:05:07 +02:00
|
|
|
void check_and_dump_old_cluster(bool live_check);
|
2010-10-19 23:38:16 +02:00
|
|
|
void check_new_cluster(void);
|
|
|
|
void report_clusters_compatible(void);
|
2017-06-20 19:20:02 +02:00
|
|
|
void issue_warnings_and_set_wal_level(void);
|
2020-11-09 12:14:59 +01:00
|
|
|
void output_completion_banner(char *deletion_script_file_name);
|
2010-10-19 23:38:16 +02:00
|
|
|
void check_cluster_versions(void);
|
|
|
|
void check_cluster_compatibility(bool live_check);
|
2010-10-20 00:37:04 +02:00
|
|
|
void create_script_for_old_cluster_deletion(char **deletion_script_file_name);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* controldata.c */
|
|
|
|
|
2010-10-19 23:38:16 +02:00
|
|
|
void get_control_data(ClusterInfo *cluster, bool live_check);
|
2012-03-06 03:19:54 +01:00
|
|
|
void check_control_data(ControlData *oldctrl, ControlData *newctrl);
|
|
|
|
void disable_old_cluster(void);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* dump.c */
|
|
|
|
|
2010-10-19 23:38:16 +02:00
|
|
|
void generate_old_dump(void);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* exec.c */
|
|
|
|
|
2012-08-27 20:21:09 +02:00
|
|
|
#define EXEC_PSQL_ARGS "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc --dbname=template1"
|
2015-03-26 19:03:19 +01:00
|
|
|
|
2022-09-20 22:09:30 +02:00
|
|
|
bool exec_prog(const char *log_filename, const char *opt_log_file,
|
2018-01-09 04:43:51 +01:00
|
|
|
bool report_error, bool exit_on_error, const char *fmt,...) pg_attribute_printf(5, 6);
|
2010-10-19 23:38:16 +02:00
|
|
|
void verify_directories(void);
|
2013-01-24 21:20:11 +01:00
|
|
|
bool pid_lock_file_exists(const char *datadir);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* file.c */
|
|
|
|
|
2018-11-07 18:05:54 +01:00
|
|
|
void cloneFile(const char *src, const char *dst,
|
2018-12-01 21:45:11 +01:00
|
|
|
const char *schemaName, const char *relName);
|
Improve error reporting in pg_upgrade's file copying/linking/rewriting.
The previous design for this had copyFile(), linkFile(), and
rewriteVisibilityMap() returning strerror strings, with the caller
producing one-size-fits-all error messages based on that. This made it
impossible to produce messages that described the failures with any degree
of precision, especially not short-read problems since those don't set
errno at all.
Since pg_upgrade has no intention of continuing after any error in this
area, let's fix this by just letting these functions call pg_fatal() for
themselves, making it easy for each point of failure to have a suitable
error message. Taking this approach also allows dropping cleanup code
that was unnecessary and was often rather sloppy about preserving errno.
To not lose relevant info that was reported before, pass in the schema name
and table name of the current table so that they can be included in the
error reports.
An additional problem was the use of getErrorText(), which was flat out
wrong for all but a couple of call sites, because it unconditionally did
"_dosmaperr(GetLastError())" on Windows. That's only appropriate when
reporting an error from a Windows-native API, which only a couple of
the callers were actually doing. Thus, even the reported strerror string
would be unrelated to the actual failure in many cases on Windows.
To fix, get rid of getErrorText() altogether, and just have call sites
do strerror(errno) instead, since that's the way all the rest of our
frontend programs do it. Add back the _dosmaperr() calls in the two
places where that's actually appropriate.
In passing, make assorted messages hew more closely to project style
guidelines, notably by removing initial capitals in not-complete-sentence
primary error messages. (I didn't make any effort to clean up places
I didn't have another reason to touch, though.)
Per discussion of a report from Thomas Kellerer. Back-patch to 9.6,
but no further; given the relative infrequency of reports of problems
here, it's not clear it's worth adapting the patch to older branches.
Patch by me, but with credit to Alvaro Herrera for spotting the issue
with getErrorText's misuse of _dosmaperr().
Discussion: <nsjrbh$8li$1@blaine.gmane.org>
2016-10-01 02:40:27 +02:00
|
|
|
void copyFile(const char *src, const char *dst,
|
|
|
|
const char *schemaName, const char *relName);
|
2024-03-05 23:39:50 +01:00
|
|
|
void copyFileByRange(const char *src, const char *dst,
|
|
|
|
const char *schemaName, const char *relName);
|
Improve error reporting in pg_upgrade's file copying/linking/rewriting.
The previous design for this had copyFile(), linkFile(), and
rewriteVisibilityMap() returning strerror strings, with the caller
producing one-size-fits-all error messages based on that. This made it
impossible to produce messages that described the failures with any degree
of precision, especially not short-read problems since those don't set
errno at all.
Since pg_upgrade has no intention of continuing after any error in this
area, let's fix this by just letting these functions call pg_fatal() for
themselves, making it easy for each point of failure to have a suitable
error message. Taking this approach also allows dropping cleanup code
that was unnecessary and was often rather sloppy about preserving errno.
To not lose relevant info that was reported before, pass in the schema name
and table name of the current table so that they can be included in the
error reports.
An additional problem was the use of getErrorText(), which was flat out
wrong for all but a couple of call sites, because it unconditionally did
"_dosmaperr(GetLastError())" on Windows. That's only appropriate when
reporting an error from a Windows-native API, which only a couple of
the callers were actually doing. Thus, even the reported strerror string
would be unrelated to the actual failure in many cases on Windows.
To fix, get rid of getErrorText() altogether, and just have call sites
do strerror(errno) instead, since that's the way all the rest of our
frontend programs do it. Add back the _dosmaperr() calls in the two
places where that's actually appropriate.
In passing, make assorted messages hew more closely to project style
guidelines, notably by removing initial capitals in not-complete-sentence
primary error messages. (I didn't make any effort to clean up places
I didn't have another reason to touch, though.)
Per discussion of a report from Thomas Kellerer. Back-patch to 9.6,
but no further; given the relative infrequency of reports of problems
here, it's not clear it's worth adapting the patch to older branches.
Patch by me, but with credit to Alvaro Herrera for spotting the issue
with getErrorText's misuse of _dosmaperr().
Discussion: <nsjrbh$8li$1@blaine.gmane.org>
2016-10-01 02:40:27 +02:00
|
|
|
void linkFile(const char *src, const char *dst,
|
|
|
|
const char *schemaName, const char *relName);
|
|
|
|
void rewriteVisibilityMap(const char *fromfile, const char *tofile,
|
|
|
|
const char *schemaName, const char *relName);
|
2018-11-07 18:05:54 +01:00
|
|
|
void check_file_clone(void);
|
2024-03-05 23:39:50 +01:00
|
|
|
void check_copy_file_range(void);
|
2010-10-19 23:38:16 +02:00
|
|
|
void check_hard_link(void);
|
2018-02-05 16:58:27 +01:00
|
|
|
|
|
|
|
/* fopen_priv() is no longer different from fopen() */
|
|
|
|
#define fopen_priv(path, mode) fopen(path, mode)
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
/* function.c */
|
|
|
|
|
2010-10-19 23:38:16 +02:00
|
|
|
void get_loadable_libraries(void);
|
|
|
|
void check_loadable_libraries(void);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
/* info.c */
|
|
|
|
|
2010-10-19 23:38:16 +02:00
|
|
|
FileNameMap *gen_db_file_maps(DbInfo *old_db,
|
2010-05-12 04:19:11 +02:00
|
|
|
DbInfo *new_db, int *nmaps, const char *old_pgdata,
|
|
|
|
const char *new_pgdata);
|
Migrate logical slots to the new node during an upgrade.
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
2023-10-26 03:24:16 +02:00
|
|
|
void get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check);
|
|
|
|
int count_old_cluster_logical_slots(void);
|
Allow upgrades to preserve the full subscription's state.
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.
Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker). Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.
To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.
The subscription's replication origin is needed to ensure that we don't
replicate anything twice.
To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.
pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.
Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 03:38:46 +01:00
|
|
|
int count_old_cluster_subscriptions(void);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
/* option.c */
|
|
|
|
|
2010-10-19 23:38:16 +02:00
|
|
|
void parseCommandLine(int argc, char *argv[]);
|
2011-10-07 20:40:23 +02:00
|
|
|
void adjust_data_dir(ClusterInfo *cluster);
|
2012-09-03 19:52:34 +02:00
|
|
|
void get_sock_dir(ClusterInfo *cluster, bool live_check);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 17:39:09 +02:00
|
|
|
/* relfilenumber.c */
|
2010-05-12 04:19:11 +02:00
|
|
|
|
2013-01-09 14:57:47 +01:00
|
|
|
void transfer_all_new_tablespaces(DbInfoArr *old_db_arr,
|
|
|
|
DbInfoArr *new_db_arr, char *old_pgdata, char *new_pgdata);
|
|
|
|
void transfer_all_new_dbs(DbInfoArr *old_db_arr,
|
|
|
|
DbInfoArr *new_db_arr, char *old_pgdata, char *new_pgdata,
|
|
|
|
char *old_tablespace);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
/* tablespace.c */
|
|
|
|
|
2010-10-19 23:38:16 +02:00
|
|
|
void init_tablespaces(void);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* server.c */
|
|
|
|
|
2011-01-01 18:06:36 +01:00
|
|
|
PGconn *connectToServer(ClusterInfo *cluster, const char *db_name);
|
2011-09-10 22:12:46 +02:00
|
|
|
PGresult *executeQueryOrDie(PGconn *conn, const char *fmt,...) pg_attribute_printf(2, 3);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
2012-09-03 19:52:34 +02:00
|
|
|
char *cluster_conn_opts(ClusterInfo *cluster);
|
|
|
|
|
2018-01-09 04:43:51 +01:00
|
|
|
bool start_postmaster(ClusterInfo *cluster, bool report_and_exit_on_error);
|
|
|
|
void stop_postmaster(bool in_atexit);
|
2011-01-01 18:28:48 +01:00
|
|
|
uint32 get_major_server_version(ClusterInfo *cluster);
|
2011-05-16 16:46:52 +02:00
|
|
|
void check_pghost_envvar(void);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* util.c */
|
|
|
|
|
2010-10-19 23:38:16 +02:00
|
|
|
char *quote_identifier(const char *s);
|
2013-12-18 19:31:35 +01:00
|
|
|
int get_user_info(char **user_name_p);
|
2010-10-19 23:38:16 +02:00
|
|
|
void check_ok(void);
|
2011-09-10 22:12:46 +02:00
|
|
|
void report_status(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3);
|
2013-10-02 03:24:56 +02:00
|
|
|
void pg_log(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3);
|
2015-03-11 14:19:54 +01:00
|
|
|
void pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn();
|
2012-11-30 22:30:13 +01:00
|
|
|
void end_progress_output(void);
|
Restructure pg_upgrade output directories for better idempotence
38bfae3 has moved the contents written to files by pg_upgrade under a
new directory called pg_upgrade_output.d/ located in the new cluster's
data folder, and it used a simple structure made of two subdirectories
leading to a fixed structure: log/ and dump/. This design has made
weaker pg_upgrade on repeated calls, as we could get failures when
creating one or more of those directories, while potentially losing the
logs of a previous run (logs are retained automatically on failure, and
cleaned up on success unless --retain is specified). So a user would
need to clean up pg_upgrade_output.d/ as an extra step for any repeated
calls of pg_upgrade. The most common scenario here is --check followed
by the actual upgrade, but one could see a failure when specifying an
incorrect input argument value. Removing entirely the logs would have
the disadvantage of removing all the past information, even if --retain
was specified at some past step.
This result is annoying for a lot of users and automated upgrade flows.
So, rather than requiring a manual removal of pg_upgrade_output.d/, this
redesigns the set of output directories in a more dynamic way, based on
a suggestion from Tom Lane and Daniel Gustafsson. pg_upgrade_output.d/
is still the base path, but a second directory level is added, mostly
named after an ISO-8601-formatted timestamp (in short human-readable,
with milliseconds appended to the name to avoid any conflicts). The
logs and dumps are saved within the same subdirectories as previously,
as of log/ and dump/, but these are located inside the subdirectory
named after the timestamp.
The logs of a given run are removed only after a successful run if
--retain is not used, and pg_upgrade_output.d/ is kept if there are any
logs from a previous run. Note that previously, pg_upgrade would have
kept the logs even after a successful --check but that was inconsistent
compared to the case without --check when using --retain. The code in
charge of the removal of the output directories is now refactored into a
single routine.
Two TAP tests are added with some --check commands (one failure case and
one success case), to look after the issue fixed here. Note that the
tests had to be tweaked a bit to fit with the new directory structure so
as it can find any logs generated on failure. This is still going to
require a change in the buildfarm client for the case where pg_upgrade
is tested without the TAP test, though, but I'll tackle that with a
separate patch where needed.
Reported-by: Tushar Ahuja
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Justin Pryzby
Discussion: https://postgr.es/m/77e6ecaa-2785-97aa-f229-4b6e047cbd2b@enterprisedb.com
2022-06-08 03:53:01 +02:00
|
|
|
void cleanup_output_dirs(void);
|
2011-09-10 22:12:46 +02:00
|
|
|
void prep_status(const char *fmt,...) pg_attribute_printf(1, 2);
|
2022-02-21 17:34:59 +01:00
|
|
|
void prep_status_progress(const char *fmt,...) pg_attribute_printf(1, 2);
|
2010-09-28 23:41:03 +02:00
|
|
|
unsigned int str2uint(const char *str);
|
|
|
|
|
2010-05-12 04:19:11 +02:00
|
|
|
|
|
|
|
/* version.c */
|
|
|
|
|
Fix some more omissions in pg_upgrade's tests for non-upgradable types.
Commits 29aeda6e4 et al closed up some oversights involving not checking
for non-upgradable types within container types, such as arrays and
ranges. However, I only looked at version.c, failing to notice that
there were substantially-equivalent tests in check.c. (The division
of responsibility between those files is less than clear...)
In addition, because genbki.pl does not guarantee that auto-generated
rowtype OIDs will hold still across versions, we need to consider that
the composite type associated with a system catalog or view is
non-upgradable. It seems unlikely that someone would have a user
column declared that way, but if they did, trying to read it in another
PG version would likely draw "no such pg_type OID" failures, thanks
to the type OID embedded in composite Datums.
To support the composite and reg*-type cases, extend the recursive
query that does the search to allow any base query that returns
a column of pg_type OIDs, rather than limiting it to exactly one
starting type.
As before, back-patch to all supported branches.
Discussion: https://postgr.es/m/2798740.1619622555@sss.pgh.pa.us
2021-04-29 21:24:37 +02:00
|
|
|
bool check_for_data_types_usage(ClusterInfo *cluster,
|
|
|
|
const char *base_query,
|
|
|
|
const char *output_path);
|
|
|
|
bool check_for_data_type_usage(ClusterInfo *cluster,
|
2021-05-20 19:03:08 +02:00
|
|
|
const char *type_name,
|
Fix some more omissions in pg_upgrade's tests for non-upgradable types.
Commits 29aeda6e4 et al closed up some oversights involving not checking
for non-upgradable types within container types, such as arrays and
ranges. However, I only looked at version.c, failing to notice that
there were substantially-equivalent tests in check.c. (The division
of responsibility between those files is less than clear...)
In addition, because genbki.pl does not guarantee that auto-generated
rowtype OIDs will hold still across versions, we need to consider that
the composite type associated with a system catalog or view is
non-upgradable. It seems unlikely that someone would have a user
column declared that way, but if they did, trying to read it in another
PG version would likely draw "no such pg_type OID" failures, thanks
to the type OID embedded in composite Datums.
To support the composite and reg*-type cases, extend the recursive
query that does the search to allow any base query that returns
a column of pg_type OIDs, rather than limiting it to exactly one
starting type.
As before, back-patch to all supported branches.
Discussion: https://postgr.es/m/2798740.1619622555@sss.pgh.pa.us
2021-04-29 21:24:37 +02:00
|
|
|
const char *output_path);
|
2014-05-14 22:26:06 +02:00
|
|
|
void old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
|
Change unknown-type literals to type text in SELECT and RETURNING lists.
Previously, we left such literals alone if the query or subquery had
no properties forcing a type decision to be made (such as an ORDER BY or
DISTINCT clause using that output column). This meant that "unknown" could
be an exposed output column type, which has never been a great idea because
it could result in strange failures later on. For example, an outer query
that tried to do any operations on an unknown-type subquery output would
generally fail with some weird error like "failed to find conversion
function from unknown to text" or "could not determine which collation to
use for string comparison". Also, if the case occurred in a CREATE VIEW's
query then the view would have an unknown-type column, causing similar
failures in queries trying to use the view.
To fix, at the tail end of parse analysis of a query, forcibly convert any
remaining "unknown" literals in its SELECT or RETURNING list to type text.
However, provide a switch to suppress that, and use it in the cases of
SELECT inside a set operation or INSERT command. In those cases we already
had type resolution rules that make use of context information from outside
the subquery proper, and we don't want to change that behavior.
Also, change creation of an unknown-type column in a relation from a
warning to a hard error. The error should be unreachable now in CREATE
VIEW or CREATE MATVIEW, but it's still possible to explicitly say "unknown"
in CREATE TABLE or CREATE (composite) TYPE. We want to forbid that because
it's nothing but a foot-gun.
This change creates a pg_upgrade failure case: a matview that contains an
unknown-type column can't be pg_upgraded, because reparsing the matview's
defining query will now decide that the column is of type text, which
doesn't match the cstring-like storage that the old materialized column
would actually have. Add a checking pass to detect that. While at it,
we can detect tables or composite types that would fail, essentially
for free. Those would fail safely anyway later on, but we might as
well fail earlier.
This patch is by me, but it owes something to previous investigations
by Rahila Syed. Also thanks to Ashutosh Bapat and Michael Paquier for
review.
Discussion: https://postgr.es/m/CAH2L28uwwbL9HUM-WR=hromW1Cvamkn7O-g8fPY2m=_7muJ0oA@mail.gmail.com
2017-01-25 15:17:18 +01:00
|
|
|
void old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
|
2017-05-19 22:49:38 +02:00
|
|
|
void old_9_6_invalidate_hash_indexes(ClusterInfo *cluster,
|
|
|
|
bool check_mode);
|
2010-05-12 04:19:11 +02:00
|
|
|
|
Check for tables with sql_identifier during pg_upgrade
Commit 7c15cef86d changed sql_identifier data type to be based on name
instead of varchar. Unfortunately, this breaks on-disk format for this
data type. Luckily, that should be a very rare problem, as this data
type is used only in information_schema views, so this only affects user
objects (tables, materialized views and indexes). One way to end in
such situation is to do CTAS with a query on those system views.
There are two options to deal with this - we can either abort pg_upgrade
if there are user objects with sql_identifier columns in pg_upgrade, or
we could replace the sql_identifier type with varchar. Considering how
rare the issue is expected to be, and the complexity of replacing the
data type (e.g. in matviews), we've decided to go with the simple check.
The query is somewhat complex - the sql_identifier data type may be used
indirectly - through a domain, a composite type or both, possibly in
multiple levels. Detecting this requires a recursive CTE.
Backpatch to 12, where the sql_identifier definition changed.
Reported-by: Hans Buschmann
Author: Tomas Vondra
Reviewed-by: Tom Lane
Backpatch-to: 12
Discussion: https://postgr.es/m/16045-673e8fa6b5ace196%40postgresql.org
2019-10-14 22:31:56 +02:00
|
|
|
void old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster);
|
2021-08-03 17:58:15 +02:00
|
|
|
void report_extension_updates(ClusterInfo *cluster);
|
Check for tables with sql_identifier during pg_upgrade
Commit 7c15cef86d changed sql_identifier data type to be based on name
instead of varchar. Unfortunately, this breaks on-disk format for this
data type. Luckily, that should be a very rare problem, as this data
type is used only in information_schema views, so this only affects user
objects (tables, materialized views and indexes). One way to end in
such situation is to do CTAS with a query on those system views.
There are two options to deal with this - we can either abort pg_upgrade
if there are user objects with sql_identifier columns in pg_upgrade, or
we could replace the sql_identifier type with varchar. Considering how
rare the issue is expected to be, and the complexity of replacing the
data type (e.g. in matviews), we've decided to go with the simple check.
The query is somewhat complex - the sql_identifier data type may be used
indirectly - through a domain, a composite type or both, possibly in
multiple levels. Detecting this requires a recursive CTE.
Backpatch to 12, where the sql_identifier definition changed.
Reported-by: Hans Buschmann
Author: Tomas Vondra
Reviewed-by: Tom Lane
Backpatch-to: 12
Discussion: https://postgr.es/m/16045-673e8fa6b5ace196%40postgresql.org
2019-10-14 22:31:56 +02:00
|
|
|
|
2012-12-27 01:26:30 +01:00
|
|
|
/* parallel.c */
|
2013-01-09 14:57:47 +01:00
|
|
|
void parallel_exec_prog(const char *log_file, const char *opt_log_file,
|
2015-03-11 14:19:54 +01:00
|
|
|
const char *fmt,...) pg_attribute_printf(3, 4);
|
2013-01-09 14:57:47 +01:00
|
|
|
void parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr,
|
|
|
|
char *old_pgdata, char *new_pgdata,
|
|
|
|
char *old_tablespace);
|
|
|
|
bool reap_child(bool wait_for_child);
|