Commit 6b80394781 introduced integer comparison functions designed
to be as efficient as possible while avoiding overflow. This
commit makes use of these functions in many of the in-tree qsort()
comparators to help ensure transitivity. Many of these comparator
functions should also see a small performance boost.
Author: Mats Kindahl
Reviewed-by: Andres Freund, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
Commit 5b2f4afffe refactored find_other_exec() and in the process
created pipe_read_line() into a static routine for reading a single
line of output, aimed at reading version numbers. Commit a7e8ece41
later exposed it externally in order to read a postgresql.conf GUC
using "postgres -C ..". Further, f06b1c598 also made use of it for
reading a version string much like find_other_exec(). The internal
variable remained "pgver", even when used for other purposes.
Since the function requires passing a buffer and its size, and at
most size - 1 bytes will be read via fgets(), there is a truncation
risk when using this for reading GUCs (like how pg_rewind does,
though the risk in this case is marginal).
To keep this as generic functionality for reading a line from a pipe,
this refactors pipe_read_line() into returning an allocated buffer
containing all of the line to remove the risk of silent truncation.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/DEDF73CE-D528-49A3-9089-B3592FD671A9@yesql.se
This commit introduces a new subscription option named 'failover', which
provides users with the ability to set the failover property of the
replication slot on the publisher when creating or altering a
subscription.
This uses the replication commands introduced by commit 7329240437 to
enable the failover option for a logical replication slot.
If the failover option is set to true, the associated replication slots
(i.e. the main slot and the table sync slots) in the upstream database are
enabled to be synchronized to the standbys. Note that the capability to
sync the replication slots will be added in subsequent commits.
Thanks to Masahiko Sawada for the design inputs.
Author: Shveta Malik, Hou Zhijie, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
This commit adds the failover property to the replication slot. The
failover property indicates whether the slot will be synced to the standby
servers, enabling the resumption of corresponding logical replication
after failover. But note that this commit does not yet include the
capability to sync the replication slot; the subsequent commits will add
that capability.
A new optional parameter 'failover' is added to the
pg_create_logical_replication_slot() function. We will also enable to set
'failover' option for slots via the subscription commands in the
subsequent commits.
The value of the 'failover' flag is displayed as part of
pg_replication_slots view.
Author: Hou Zhijie, Shveta Malik, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
This is a rework of 7021d3b176, where we relied on forcing
max_logical_replication_workers to 0 in the postgres command. This
commit now prevents logical replication launchers to start using -b and
a backend-side check based on IsBinaryUpgrade, effective when upgrading
from 17 and newer versions.
This commit improves the comments explaining why this restriction is
necessary.
This discussion was on hold until we were sure how to add support for
subscribers in pg_upgrade, something now done thanks to 9a17be1e24.
Reviewed-by: Álvaro Herrera, Amit Kapila, Tom Lane
Discussion: https://postgr.es/m/ZU2TeVkUg5qEi7Oy@paquier.xyz
During upgrade, when pg_restore performs CREATE DATABASE, bgwriter or
checkpointer may flush buffers and hold a file handle for pg_largeobject,
so later TRUNCATE pg_largeobject command will fail if OS (such as older
Windows versions) doesn't remove an unlinked file completely till it's
open. The probability of seeing this behavior is higher in this test
because we use wal_level as logical via allows_streaming => 'logical'
which in turn set shared_buffers as 1MB and make it more probable for
bgwriter to hold the file handle.
Diagnosed-by: Alexander Lakhin
Author: Hayato Kuroda, Amit Kapila
Reviewed-by: Alexander Lakhin
Discussion: https://postgr.es/m/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A@TYAPR01MB5866.jpnprd01.prod.outlook.com
This patch changes the existing 'conflicting' field to 'conflict_reason'
in pg_replication_slots. This new field indicates the reason for the
logical slot's conflict with recovery. It is always NULL for physical
slots, as well as for logical slots which are not invalidated. The
non-NULL values indicate that the slot is marked as invalidated. Possible
values are:
wal_removed = required WAL has been removed.
rows_removed = required rows have been removed.
wal_level_insufficient = the primary doesn't have a wal_level sufficient
to perform logical decoding.
The existing users of 'conflicting' column can get the same answer by
using 'conflict_reason' IS NOT NULL.
Author: Shveta Malik
Reviewed-by: Amit Kapila, Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/ZYOE8IguqTbp-seF@paquier.xyz
New TAP tests added by commits 9a17be1e and 4710b67d missed to convert
warnings to FATAL. This commit fixes that.
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.
Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker). Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.
To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.
The subscription's replication origin is needed to ensure that we don't
replicate anything twice.
To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.
pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.
Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
There are a lot of Perl scripts in the tree, mostly code generation
and TAP tests. Occasionally, these scripts produce warnings. These
are probably always mistakes on the developer side (true positives).
Typical examples are warnings from genbki.pl or related when you make
a mess in the catalog files during development, or warnings from tests
when they massage a config file that looks different on different
hosts, or mistakes during merges (e.g., duplicate subroutine
definitions), or just mistakes that weren't noticed because there is a
lot of output in a verbose build.
This changes all warnings into fatal errors, by replacing
use warnings;
by
use warnings FATAL => 'all';
in all Perl files.
Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
During a pg_upgrade test using an old dump, all references to the old
regress shared library path (so, dylib or dll) are updated to point to
the library path used by the new build, to ensure a consistent
comparison between the old and new dumps.
The test previously relied on a hardcoded value of "src/test/regress/"
to build the new path value, which would point to an incorrect location
for the meson and vpath builds. This is replaced by REGRESS_SHLIB, able
to point to the correct location of the regress shared library.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/a628d8ad-a08a-2eab-4ca9-641bc82d3193@gmail.com
Backpatch-through: 15
This commit removes the restriction that would not apply filters to the
dumps used for comparison in the TAP test of pg_upgrade when using the
same base version for the old and new nodes.
The previous logic would fail on Windows if loading a dump while using
the same set of binaries for the old and new nodes, as the library
dependencies updated in the old dump would append CRLFs to the dump
file as it is treated as a text file. The dump filtering logic replaces
all CRLFs (\r\n) by LFs (\n), which is able to prevent this issue.
When the old and new versions of the binaries are the same,
AdjustUpgrade removes all blank lines, removes version-based comments
generated by pg_dump and replaces CRLFs by LFs.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/60d434b9-53d9-9ea1-819b-efebdcf44e41@gmail.com
Backpatch-through: 15
The failed test was trying to validate that the two_phase option for a
slot is retained after the upgrade. The problem was that it didn't get
enabled before the upgrade so expecting to be enabled after the upgrade
was not right in the first place. This is a timing issue because we enable
the two_phase once the initial sync for all the tables is finished.
Normally, we wait for the two_phase state to be enabled but this test
missed that step.
Per buildfarm.
Author: Hayato Kuroda
Discussion: https://postgr.es/m/TY3PR01MB98892396D1071FC4320D6B31F586A@TY3PR01MB9889.jpnprd01.prod.outlook.com
We allow to upgrade the slot iff there is no pending WAL to be processed.
The test first disables the subscription to avoid unnecessary LOGs on the
subscriber and then stops the publisher node. It is quite possible that
just before the shutdown of the publisher, autovacuum generates some WAL
record that needs to be processed, so just disable the autovacuum for this
test.
Per buildfarm.
Author: Hayato Kuroda
Reviewed-by: Amit Kapila
Discussion: http://postgr.es/m/OS3PR01MB9882FED1F0060468FB01B9DAF583A@OS3PR01MB9882.jpnprd01.prod.outlook.com
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation. We have done this consistent with established
practice at the time to not require these tools for building from a
tarball. Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.
Now this has at least two problems:
One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball. This is pretty
complicated, but it works so far for autoconf/make. It does not
currently work for meson; you can currently only build with meson from
a git checkout. Making meson builds work from a tarball seems very
difficult or impossible. One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree. So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree. So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.
Second, there is increased interest nowadays in precisely tracking the
origin of software. We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs. But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.
The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball. The tarball now only contains
what is in the git tree (*). Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant. And of course we
want to get the meson build system working universally.
This commit removes the make distprep target altogether. The make
dist target continues to do its job, it just doesn't call distprep
anymore.
(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep. This is unchanged for now.
The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep. (In practice, it is probably obsolete given
that git clean is available.)
The following programs are now hard build requirements in configure
(they were already required by meson.build):
- bison
- flex
- perl
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
The logical replication launcher may start apply workers during an
upgrade. This could be the cause of corruptions on a new cluster if
these are able to apply changes before the physical files are copied
over to the new cluster.
The chance of being able to do so is small as pg_upgrade uses its own
port and unix domain directory (the latter is customizable with
--socketdir), but just preventing the launcher to start is safer at the
end, because we are then sure that no changes will be applied. Like
29d0a77fa6 for max_slot_wal_keep_size, this is only set when a cluster
uses v17 or newer.
Author: Vignesh C
Discussion: https://postgr.es/m/CALDaNm2g9ZKf=y8X6z6MsLCuh8WwU-=Q6pLj35NFi2M5BZNS_A@mail.gmail.com
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
Commit cda6a8d01d removed a few datatypes, but didn't update
pg_upgrade --check to throw error if these types are used. So the users
find that pg_upgrade --check tells them that everything is fine, only to
fail when the real upgrade is attempted.
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Suraj Kharage <suraj.kharage@enterprisedb.com>
Discussion: https://postgr.es/m/202309201654.ng4ksea25mti@alvherre.pgsql
This commit allows specifying a --sync-method in several frontend
utilities that must synchronize many files to disk (initdb,
pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade).
On Linux, users can specify "syncfs" to synchronize the relevant
file systems instead of calling fsync() for every single file. In
many cases, using syncfs() is much faster.
As with recovery_init_sync_method, this new option comes with some
caveats. The descriptions of these caveats have been moved to a
new appendix section in the documentation.
Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier, Thomas Munro, Robert Haas, Justin Pryzby
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
Commit 7b378237aa added a status message to pg_upgrade that is 60
characters wide. Since the MESSAGE_WIDTH macro is currently set to
60, there is no space between this new status message and the "ok"
or "failed" indicator appended when the step completes. To fix
this problem, this commit increases the value of MESSAGE_WIDTH to
62.
Suggested-by: Bharath Rupireddy
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/CALj2ACVVvk1cYLtWVxHv%3DZ1Ubq%3DUES9fhKbUU4c9k4W%2BfEDnbw%40mail.gmail.com
Backpatch-through: 16
The new_cluster parameter in check_for_new_tablespace_dir was
shadowing the globally defined new_cluster variable, causing
compiler warnings when running with -Wshadow. The function is
only applicable to the new cluster, so remove the parameter
rather than rename to match check_new_cluster_is_empty which
also only applies to the new cluster.
Author: Peter Smith <peter.b.smith@fujitsu.com>
Discussion: https://postgr.es/m/CAHut+PvS_PHLntWy1yTgXv0O1tWm4iVcKBQFzpoQRDsm2Ce_Fg@mail.gmail.com
When the cluster failed the pg_controldata check for clean shut
down we only reported that it did so, not why. The state of the
cluster can be important information when diagnosing the failed
upgrade attempt, so instead include it in the error message.
Discussion: https://postgr.es/m/E0D5EA16-A085-4753-8DDC-C7055048B439@yesql.se
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.
To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.
An invalid database cannot be connected to anymore, but can still be
dropped.
Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.
Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.
Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
vacuum_defer_cleanup_age was introduced before hot_standby_feedback and
replication slots existed. It is hard to use reasonably - commonly it will
either be set too low (not preventing recovery conflicts, while still causing
some bloat), or too high (causing a lot of bloat). The alternatives do not
have that issue.
That on its own might not be sufficient reason to remove
vacuum_defer_cleanup_age, but it also complicates computation of xid
horizons. See e.g. the bug fixed in be504a3e97. It also is untested.
This commit removes TransactionIdRetreatSafely(), as there are no users
anymore. There might be potential future users, hence noting that here.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4axls@awork3.anarazel.de
In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a
later commit, we need the addresses of user space buffers to be well
aligned. The exact requirements vary by OS and file system (typically
sectors and/or memory pages). The address alignment size is set to
4096, which is enough for currently known systems: it matches modern
sectors and common memory page size. There is no standard governing
O_DIRECT's requirements so we might eventually have to reconsider this
with more information from the field or future systems.
Aligning I/O buffers on memory pages is also known to improve regular
buffered I/O performance.
Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap buffers are now allocated with the new palloc_aligned() or
MemoryContextAllocAligned() functions introduced by commit 439f6175.
(2) Stack buffers now use a new struct PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer
pool is also aligned in shared memory.
WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for
XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus
for O_DIRECT WAL writes to fail to be well aligned, but that's a
pre-existing condition and will be addressed by a later commit.
BufFiles are not yet addressed (there's no current plan to use O_DIRECT
for those, but they could potentially get some incidental speedup even
in plain buffered I/O operations through better alignment).
If we can't align stack objects suitably using the compiler extensions
we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to
0. This avoids the need to consider systems that have O_DIRECT but
can't align stack objects the way we want; such systems could in theory
be supported with more work but we don't currently know of any such
machines, so it's easier to pretend there is no O_DIRECT support
instead. That's an existing and tested class of system.
Add assertions that all buffers passed into smgrread(), smgrwrite() and
smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack
alignment tricks may be unavailable) or the block size has been set too
small to allow arrays of buffers to be all aligned.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
The check for the number of roles in the target cluster for an upgrade
selects the existing roles and performs a COUNT(*) over the result. A
value of one is the expected query result value indicating that only
the install user is present in the new cluster. The result was converted
with the function for converting a string containing an Oid into a numeric,
which avoids potential overflow but makes the code less readable since
it's not actually an Oid at all.
Discussion: https://postgr.es/m/41AB5F1F-4389-4B25-9668-5C430375836C@yesql.se
Previously, the default encoding was derived from the locale when
using libc; while the default was always UTF-8 when using ICU. That
would throw an error when the locale was not compatible with UTF-8.
This commit causes initdb to derive the default encoding from the
locale for both providers. If --no-locale is specified (or if the
locale is C or POSIX), the default encoding will be UTF-8 for ICU
(because ICU does not support SQL_ASCII) and SQL_ASCII for libc.
Per buildfarm failure on system "hoverfly" related to commit
27b62377b4.
Discussion: https://postgr.es/m/d191d5841347301a8f1238f609471ddd957fc47e.camel%40j-davis.com
Previously, pg_upgrade checked that the old and new clusters were
compatible, including the locale and encoding. But the new cluster was
just created, and only template0 from the new cluster will be
preserved (template1 and postgres are both recreated during the
upgrade process).
Because template0 is not sensitive to locale or encoding, just update
the pg_database entry to be the same as template0 from the original
cluster.
This commit makes it easier to change the default initdb locale or
encoding settings without causing needless incompatibilities.
Discussion: https://postgr.es/m/d62b2874-729b-d26a-2d0a-0d64f509eca4@enterprisedb.com
Reviewed-by: Peter Eisentraut