Commit Graph

356 Commits

Author SHA1 Message Date
Amit Kapila 6f132ed693 Allow synced slots to have their inactive_since.
This commit does two things:
1) Maintains inactive_since for sync slots whenever the slot is released
just like any other regular slot.

2) Ensures the value is set to the current timestamp during the promotion
of standby to help correctly interpret the time after promotion. We don't
want the slots to appear inactive for a long time after promotion if they
haven't been synchronized recently. This would also avoid the invalidation
of such slots immediately after promotion if tomorrow we have a feature
that invalidates slots based on their inactivity time. Whoever acquires
the slot i.e. makes the slot active will reset it to NULL.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada
Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
2024-04-05 09:48:49 +05:30
Amit Kapila 2ec005b4e2 Ensure that the sync slots reach a consistent state after promotion without losing data.
We were directly copying the LSN locations while syncing the slots on the
standby. Now, it is possible that at some particular restart_lsn there are
some running xacts, which means if we start reading the WAL from that
location after promotion, we won't reach a consistent snapshot state at
that point. However, on the primary, we would have already been in a
consistent snapshot state at that restart_lsn so we would have just
serialized the existing snapshot.

To avoid this problem we will use the advance_slot functionality unless
the snapshot already exists at the synced restart_lsn location. This will
help us to ensure that snapbuilder/slot statuses are updated properly
without generating any changes. Note that the synced slot will remain as
RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a
consistent snapshot state after which they will be marked as
RS_PERSISTENT.

Per buildfarm

Author: Hou Zhijie
Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2024-04-03 14:04:59 +05:30
Alexander Korotkov 06c418e163 Implement pg_wal_replay_wait() stored procedure
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed before starting the transaction.
This option is useful when the user makes some data changes on primary and
needs a guarantee to see these changes on standby.

The queue of waiters is stored in the shared memory array sorted by LSN.
During replay of WAL waiters whose LSNs are already replayed are deleted from
the shared memory array and woken up by setting of their latches.

pg_wal_replay_wait() needs to wait without any snapshot held.  Otherwise,
the snapshot could prevent the replay of WAL records implying a kind of
self-deadlock.  This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working in a non-atomic context,
not a function.

Catversion is bumped.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
2024-04-02 22:48:03 +03:00
Amit Kapila 6d49c8d4b4 Change last_inactive_time to inactive_since in pg_replication_slots.
Commit a11f330b55 added last_inactive_time to show the last time the slot
was inactive. But, it tells the last time that a currently-inactive slot
previously *WAS* active. This could be unclear, so we changed the name to
inactive_since.

Reported-by: Robert Haas
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Shveta Malik, Amit Kapila
Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACUXS0SfbHzsX8bqo+7CZhocsV52Kiu7OWGb5HVPAmJqnA@mail.gmail.com
2024-03-27 09:27:44 +05:30
Amit Kapila a11f330b55 Track last_inactive_time in pg_replication_slots.
This commit adds a new property called last_inactive_time for slots. It is
set to 0 whenever a slot is made active/acquired and set to the current
timestamp whenever the slot is inactive/released or restored from the disk.
Note that we don't set the last_inactive_time for the slots currently being
synced from the primary to the standby because such slots are typically
inactive as decoding is not allowed on those.

The 'last_inactive_time' will be useful on production servers to debug and
analyze inactive replication slots. It will also help to know the lifetime
of a replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down.

The 'last_inactive_time' will also be useful to implement inactive
timeout-based replication slot invalidation in a future commit.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
2024-03-25 16:34:33 +05:30
Amit Kapila 6ae701b437 Track invalidation_reason in pg_replication_slots.
Till now, the reason for replication slot invalidation is not tracked
directly in pg_replication_slots. A recent commit 007693f2a3 added
'conflict_reason' to show the reasons for slot conflict/invalidation, but
only for logical slots.

This commit adds a new column 'invalidation_reason' to show invalidation
reasons for both physical and logical slots. And, this commit also turns
'conflict_reason' text column to 'conflicting' boolean column (effectively
reverting commit 007693f2a3). The 'conflicting' column is true for
invalidation reasons 'rows_removed' and 'wal_level_insufficient' because
those make the slot conflict with recovery. When 'conflicting' is true,
one can now look at the new 'invalidation_reason' column for the reason
for the logical slot's conflict with recovery.

The new 'invalidation_reason' column will also be useful to track other
invalidation reasons in the future commit.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
2024-03-22 13:52:05 +05:30
Michael Paquier 071e3ad59d Add basic TAP tests for the low-level backup method, take two
There are currently no tests for the low-level backup method where
pg_backup_start() and pg_backup_stop() are involved while taking a
file-system backup.  The tests introduced in this commit rely on a
background psql process to make sure that the backup is taken while the
session doing the SQL start and stop calls remains alive.

Two cases are checked here with the backup taken:
- Recovery without a backup_label, leading to a corrupted state.
- Recovery with a backup_label, with a consistent state reached.
Both cases cross-check some patterns in the logs generated when running
recovery.

Compared to the first attempt in 99b4a63bef, this includes a couple of
fixes making the CI stable (5 runs succeeded here):
- Add the file to the list of tests in meson.build.
- Race condition with the first WAL segment that we expect in the
primary's archives, by adding a poll on pg_stat_archiver.  The second
segment with the checkpoint record is archived thanks to pg_backup_stop
waiting for it.
- Fix failure of test where the backup_label does not exist.  The
cluster inherits the configuration of the first node; it was attempting
to store segments in the first node's archives, triggering failures with
copy on Windows.
- Fix failure of test on Windows because of incorrect parsing of the
backup_file in the success case.  The data of the backup_label file is
retrieved from the output pg_backup_stop() from a BackgroundPsql written
directly to the backup's data folder.  This would include CRLFs (\r\n),
causing the startup process to fail at the beginning of recovery when
parsing the backup_label because only LFs (\n) are allowed.

Author: David Steele
Discussion: https://postgr.es/m/f20fcc82-dadb-478d-beb4-1e2ffb0ace76@pgmasters.net
2024-03-15 08:29:54 +09:00
Michael Paquier 6cb1b632b3 Revert "Add basic TAP tests for the low-level backup method"
This reverts commit 99b4a63bef.  The test is proving to be unstable,
so revert it for now.

One of the failures seen involves the cluster started without the
backup_label, where the archives of the primary are overwritten, causing
recovery failures on Windows.  This is simple to fix, but there is
another issue that's creeping behind in the form of an "invalid data in
file" ERROR when parsing the backup_label for the second recovery case,
as an effect of the backup_label data written after retrieving its
contents from pg_backup_stop().
_
Per buildfarm member sidewinder.
2024-03-14 13:19:12 +09:00
Michael Paquier 99b4a63bef Add basic TAP tests for the low-level backup method
There are currently no tests for the low-level backup method where
pg_backup_start() and pg_backup_stop() are involved while taking a
file-system backup.  The tests introduced in this commit rely on a
background psql process to make sure that the backup is taken while the
session doing the SQL start and stop calls remains alive.

Two cases are checked here with the backup taken:
- Recovery without a backup_label, leading to a corrupted state.
- Recovery with a backup_label, with a consistent state reached.
Both cases cross-check some patterns in the logs generated when running
recovery.

Author: David Steele
Discussion: https://postgr.es/m/f20fcc82-dadb-478d-beb4-1e2ffb0ace76@pgmasters.net
2024-03-14 10:49:52 +09:00
Amit Kapila 0b84f5c419 Fix a random failure in 038_save_logical_slots_shutdown.pl.
The test ensures that all the WAL on the publisher is sent to the
subscriber before shutdown by comparing the confirmed_flush_lsn of the
associated slot with the shutdown_checkpoint WAL location. But if the
shutdown_checkpoint location falls into a new page in the WAL then the
check won't work. So, ensure that the shutdown_checkpoint WAL record
doesn't fall into a new page.

Reported-by: Bharath Rupireddy
Author: Bharath Rupireddy
Reviewed-by: Vignesh C, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/CALj2ACVLzH5CN-h9=S26mdRHPuJ9yDLUw70yh4JOiPw03WL0CQ@mail.gmail.com
2024-03-13 08:33:26 +05:30
Michael Paquier d6e171fed6 Keep replication slot statistics on invalidation
The code path in charge of invalidating a replication slot includes a
call to pgstat_drop_replslot(), which would result in removing the
statistics of the slot once invalidated.  However, there is no need to
remove the statistics of an invalidated slot as one could still be
interested in looking at them to understand the activity of the slot
until its actual removal.

The initial design of the feature committed in be87200efd used the
approach to drop the slots, which is likely why the statistics were
still removed during the invalidation.

Another problem with this operation is that it was done without holding
ReplicationSlotAllocationLock, leaving it unprotected on concurrent
activity.  This part is arguably a bug, but that's a limited problem in
practice so no backpatch is done.

In passing, this commit adds a test to check this behavior.  The only
remaining code path where slot statistics are dropped now related to the
slot getting dropped.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/ZermH08Eq6YydHpO@ip-10-97-1-34.eu-west-3.compute.internal
2024-03-12 14:22:31 +09:00
Amit Kapila bf279ddd1c Introduce a new GUC 'standby_slot_names'.
This patch provides a way to ensure that physical standbys that are
potential failover candidates have received and flushed changes before
the primary server making them visible to subscribers. Doing so guarantees
that the promoted standby server is not lagging behind the subscribers
when a failover is necessary.

The logical walsender now guarantees that all local changes are sent and
flushed to the standby servers corresponding to the replication slots
specified in 'standby_slot_names' before sending those changes to the
subscriber.

Additionally, the SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to ensure that they process changes for failover slots only after physical
slots specified in 'standby_slot_names' have confirmed WAL receipt for those.

Author: Hou Zhijie and Shveta Malik
Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-03-08 08:10:45 +05:30
Michael Paquier 65db0cfb4c Revert "Add recovery TAP test for race condition with slot invalidations"
This reverts commit 08a52ab151, due to some sporadic instability in
the test.  Getting the test right should require some redesign with a
second injection point, but let's revert it for now to avoid these
issues in the CI as a lot of patches are under discussion in this last
commit fest.

Per buildfarm members hachi and gokiburi.

Discussion: https://postgr.es/m/ZekQQHCrIqLVpGz5@paquier.xyz
2024-03-07 09:57:52 +09:00
Michael Paquier 08a52ab151 Add recovery TAP test for race condition with slot invalidations
This commit adds a recovery test to provide coverage for the bug fixed
in 818fefd8fd, using an injection point to wait just after the process
of an active slot is killed.  The trick is to give enough time for
effective_xmin and effective_catalog_xmin to advance so as the slot
invalidation robustness can be checked since the active process is
killed without holding its slot's mutex for a short time.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/ZdyZya4YrNapWKqz@ip-10-97-1-34.eu-west-3.compute.internal
2024-03-06 14:39:40 +09:00
Alvaro Herrera 1a2654b32b
Rework redundant code in subtrans.c
When this code was written the duplicity didn't matter, but with all the
SLRU-bank stuff we just added, it has become excessive.  Turn it into a
simpler loop with no code duplication.  Also add a test so that this
code becomes covered.

Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql
2024-03-05 12:09:18 +01:00
Michael Paquier eca2c1ea85 Add PostgreSQL::Test::Cluster::wait_for_event()
Per a demand from the author and the reviewer of this commit, this adds
to Cluster.pm a helper routine that can be used to monitor when a
process reaches a wanted wait event.  This can be used in combination
with the module injection_points for the "wait" callback, though it is
not limited to it as this monitors pg_stat_activity for a wait_event and
a backend_type.

Author: Bertrand Drouvot
Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZeBB4RMPEZ06TcdY@ip-10-97-1-34.eu-west-3.compute.internal
2024-03-04 10:27:50 +09:00
Michael Paquier 6782709df8 Add regression test for restart points during promotion
This test serves as a way to demonstrate how to use the features
introduced in 37b369dc67, while providing coverage for 7863ee4def
that caused the startup process to throw "PANIC: could not locate a
valid checkpoint record" when starting recovery.  The test checks that a
node is able to properly restart following a crash when a restart point
was finishing across a promotion, with an injection point added in the
middle of CreateRestartPoint() to stop the restartpoint in flight.  Note
that this test fails when 7863ee4def is reverted.

Kyotaro Horiguchi is the original author of this test, that has been
originally posted on the thread where 7863ee4def was discussed.  I
have just upgraded and polished it to rely on injection points, making
it much cheaper to reproduce the failure.

This test requires injection points to be enabled in the builds, hence
meson and ./configure need an update to pass this knowledge down to the
test.  The name of the new injection point follows the same naming
convention as 6a1ea02c49.  The Makefile's EXTRA_INSTALL of recovery
TAP tests is updated to include modules/injection_points.

Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Andrey Borodin, Bertrand Drouvot
Discussion: https://postgr.es/m/ZdLuxBk5hGpol91B@paquier.xyz
2024-03-04 09:49:03 +09:00
Amit Kapila def0ce3370 Fix BF failure introduced by commit b3f6b14cf4.
The test added by commit b3f6b14cf4 uses a non-superuser and forgot to set
up pg_hba.conf to allow connections from it. The special setup is only
needed on Windows machines that don't use UNIX sockets.

As per buildfarm

Discussion: https://postgr.es/m/CAJpy0uCfrSspV1x3VWkgamqyhYaUWQZpP0nqjJx4YPvKqN6P_A@mail.gmail.com
2024-03-01 10:25:36 +05:30
Amit Kapila b3f6b14cf4 Fixups for commit 93db6cbda0.
Ensure to set always-secure search path for both local and remote
connections during slot synchronization, so that malicious users can't
redirect user code (e.g. operators).

In the passing, improve the name of define, remove spurious return
statement, and a minor change in one of the comments.

Author: Bertrand Drouvot and Shveta Malik
Reviewed-by: Amit Kapila, Peter Smith
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
Discussion: https://postgr.es/m/ZdcejBDCr+wlVGnO@ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://postgr.es/m/CAJpy0uBNP=nrkNJkJSfF=jSocEh8vU2Owa8Rtpi=63fG=SvfVQ@mail.gmail.com
2024-02-29 09:45:20 +05:30
Amit Kapila d13ff82319 Fix BF failure in commit 93db6cbda0.
The code to match the required LOG in the test was not robust enough to
match it. It was using a very specific format to search the required
message which doesn't work when one uses log_error_verbosity = verbose.

Author: Hou Zhijie
Discussion: https://postgr.es/m/CAA4eK1KcQSk7wzC7Zfrth9OhrjW2HvxL4tKgU42qqH7p6jn+FA@mail.gmail.com
2024-02-22 18:26:40 +05:30
Amit Kapila 93db6cbda0 Add a new slot sync worker to synchronize logical slots.
By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.

The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.

A new parameter sync_replication_slots enables or disables this new
process.

On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.

A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.

Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-22 15:25:15 +05:30
Amit Kapila 801792e528 Improve ERROR/LOG messages added by commits ddd5f4f54a and 7a424ece48.
Additionally, in slotsync.c, replace one StringInfoData variable usage
with a constant string to avoid palloc/pfree. Also, replace the inclusion
of "logical.h" with "slot.h" to prevent the exposure of unnecessary
implementation details.

Reported-by: Kyotaro Horiguchi, Masahiko Sawada
Author: Shveta Malik based on suggestions by Robert Haas and Amit Kapila
Reviewed-by: Kyotaro Horiguchi, Amit Kapila
Discussion: https://postgr.es/m/20240214.162652.773291409747353211.horikyota.ntt@gmail.com
Discussion: https://postgr.es/m/20240219.134015.1888940527023074780.horikyota.ntt@gmail.com
Discussion: https://postgr.es/m/CAD21AoCYXhDYOQDAS-rhGasC2T+tYbV=8Y18o94sB=5AxcW+yA@mail.gmail.com
2024-02-22 11:17:00 +05:30
Noah Misch 0e162810df Fix test race between primary XLOG_RUNNING_XACTS and standby logical slot.
Before the previous commit, the test could hang until
LOG_SNAPSHOT_INTERVAL_MS (15s), until checkpoint_timeout (300s), or
indefinitely.  An indefinite hang was awfully improbable.  It entailed
the test reaching checkpoint_timeout before the
DecodingContextFindStartpoint() of a CREATE SUBSCRIPTION, yet after the
preceding WAL record.  Back-patch to v16, which introduced the test.

Bertrand Drouvot, reported by Noah Misch.

Discussion: https://postgr.es/m/20240211010227.a2.nmisch@google.com
2024-02-19 12:52:28 -08:00
Noah Misch 4791f87f34 Bound waits in 035_standby_logical_decoding.pl.
One IPC::Run::start() used an IPC::Run::timer() without checking for
expiration.  The other used no timeout or timer.  Back-patch to v16,
which introduced the test.

Reviewed by Bertrand Drouvot.

Discussion: https://postgr.es/m/20240211010227.a2.nmisch@google.com
2024-02-19 12:52:07 -08:00
Amit Kapila b7bdade6a4 Disable autovacuum on primary in 040_standby_failover_slots_sync test.
Disable autovacuum to avoid generating xid during stats update as
otherwise the new XID could then be replicated to standby at some random
point making slots at primary lag behind standby during slot sync.

As per buildfarm

Author: Hou Zhijie
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
Discussion: https://postgr.es/m/CAA4eK1Jun8SGCoc6JEktxY_+L7GmoJWrdsx-KCEP=GL-SsWggQ@mail.gmail.com
2024-02-16 14:42:50 +05:30
Amit Kapila d9e225f275 Change the LOG level in 040_standby_failover_slots_sync.pl to DEBUG2.
Temporarily change the log level of 040_standby_failover_slots_sync.pl to
DEBUG2. This is to get more information about BF failures. We will reset
it back to default once the tests are stabilized.

Author: Hou Zhijie
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
Discussion: https://postgr.es/m/OS0PR01MB571633C23B2A4CAC5FB0371A944C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2024-02-16 10:13:51 +05:30
Amit Kapila 9bc1eee988 Another try to fix BF failure introduced in commit ddd5f4f54a.
Before attempting to sync the slot on standby by
pg_sync_replication_slots(), ensure that on the primary restart_lsn for
the slot has moved to a recent WAL position, by re-creating the
subscription and the logical slot.

Author: Hou Zhijie
Discussion: https://postgr.es/m/CAA4eK1+d5Lne8vCAn0un4SP9x-ZBr2-xfxg01uSfeBTSCKFZoQ@mail.gmail.com
2024-02-15 10:37:28 +05:30
Amit Kapila bd8fc1677b Fix BF introduced in commit ddd5f4f54a.
The failure is that the remote slot is not synchronized after the same
slot on standby gets invalidated. The reason was that remote_slot's
restart_lsn was lagged behind the standby's oldest WAL segment. The test
didn't ensure that remote_slot's LSN was advanced to the latest position
before we tried to sync the slots via new the function
pg_sync_replication_slots().

Discussion: https://postgr.es/m/CAA4eK1JLBi3HzenB6do3_hd78kN0UDD1mz-vumWE52XHHEq5Bw@mail.gmail.com
2024-02-14 16:16:08 +05:30
Amit Kapila ddd5f4f54a Add a slot synchronization function.
This commit introduces a new SQL function pg_sync_replication_slots()
which is used to synchronize the logical replication slots from the
primary server to the physical standby so that logical replication can be
resumed after a failover or planned switchover.

A new 'synced' flag is introduced in pg_replication_slots view, indicating
whether the slot has been synchronized from the primary server. On a
standby, synced slots cannot be dropped or consumed, and any attempt to
perform logical decoding on them will result in an error.

The logical replication slots on the primary can be synchronized to the
hot standby by using the 'failover' parameter of
pg-create-logical-replication-slot(), or by using the 'failover' option of
CREATE SUBSCRIPTION during slot creation, and then calling
pg_sync_replication_slots() on standby. For the synchronization to work,
it is mandatory to have a physical replication slot between the primary
and the standby aka 'primary_slot_name' should be configured on the
standby, and 'hot_standby_feedback' must be enabled on the standby. It is
also necessary to specify a valid 'dbname' in the 'primary_conninfo'.

If a logical slot is invalidated on the primary, then that slot on the
standby is also invalidated.

If a logical slot on the primary is valid but is invalidated on the
standby, then that slot is dropped but will be recreated on the standby in
the next pg_sync_replication_slots() call provided the slot still exists
on the primary server. It is okay to recreate such slots as long as these
are not consumable on standby (which is the case currently). This
situation may occur due to the following reasons:
- The 'max_slot_wal_keep_size' on the standby is insufficient to retain
WAL records from the restart_lsn of the slot.
- 'primary_slot_name' is temporarily reset to null and the physical slot
is removed.

The slot synchronization status on the standby can be monitored using the
'synced' column of pg_replication_slots view.

A functionality to automatically synchronize slots by a background worker
and allow logical walsenders to wait for the physical will be done in
subsequent commits.

Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-14 09:45:36 +05:30
Amit Kapila 776621a5e4 Add a failover option to subscriptions.
This commit introduces a new subscription option named 'failover', which
provides users with the ability to set the failover property of the
replication slot on the publisher when creating or altering a
subscription.

This uses the replication commands introduced by commit 7329240437 to
enable the failover option for a logical replication slot.

If the failover option is set to true, the associated replication slots
(i.e. the main slot and the table sync slots) in the upstream database are
enabled to be synchronized to the standbys. Note that the capability to
sync the replication slots will be added in subsequent commits.

Thanks to Masahiko Sawada for the design inputs.

Author: Shveta Malik, Hou Zhijie, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-30 16:49:28 +05:30
Alvaro Herrera 8c9da1441d
Make spelling of cancelled/cancellation consistent
This fixes places where words derived from cancel were not using their
common en-US ugly^H^H^H^Hspelling.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reported-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKG+Lrq+ty6yWXF5572qNQ8KwxGwG5n4fsEcCUap685nWvQ@mail.gmail.com
2024-01-26 12:38:15 +01:00
Michael Paquier 46d8587b50 Improve stability of recovery test 035_standby_logical_decoding
This commit tweaks a couple of things in 035_standby_logical_decoding to
hopefully stabilize it:
- Autovacuum is now disabled, as it could hold a global xmin with a
transaction.
- Conflicts are generated with command sequences that removed rows (on
catalogs, shared or non-shared, or just plain relations) followed by a
VACUUM.  This was unstable because these did not check that the horizon
moved between the SQL commands and the VACUUM.  The logic is refactored
as follows, to ensure that VACUUM removes dead rows before testing for
slot invalidation on a standby (idea suggested by Andres Freund):
-- Grab the current horizon.
-- Launch SQL commands removing rows.
-- Check that the snapshot horizon has been updated.
-- Launch VACUUM on the relation whose rows have been removed by the
first step.

Note that there are still some issues because of standby snapshot WAL
records generated by the bgwriter, but this makes the test much more
stable.

Per reports from buildfarm members dikkop and skink, with analysis and
tests from Alexander Lakhin.

While on it, fix a couple of incorrect comments.

Author: Bertrand Drouvot
Reviewed-by: Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/OSZPR01MB6310ED3CEDB531BCEDBC6AF2FD479@OSZPR01MB6310.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/bf67e076-b163-9ba3-4ade-b9fc51a3a8f6@gmail.com
Backpatch-through: 16
2024-01-23 14:46:01 +09:00
Andrew Dunstan dbad1c53e9 Add copyright notices to a few perl scripts that don't have them 2024-01-05 13:15:50 +00:00
Amit Kapila 007693f2a3 Track conflict_reason in pg_replication_slots.
This patch changes the existing 'conflicting' field to 'conflict_reason'
in pg_replication_slots. This new field indicates the reason for the
logical slot's conflict with recovery. It is always NULL for physical
slots, as well as for logical slots which are not invalidated. The
non-NULL values indicate that the slot is marked as invalidated. Possible
values are:

wal_removed = required WAL has been removed.
rows_removed = required rows have been removed.
wal_level_insufficient = the primary doesn't have a wal_level sufficient
to perform logical decoding.

The existing users of 'conflicting' column can get the same answer by
using 'conflict_reason' IS NOT NULL.

Author: Shveta Malik
Reviewed-by: Amit Kapila, Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/ZYOE8IguqTbp-seF@paquier.xyz
2024-01-04 08:26:25 +05:30
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Michael Paquier 4b66d7b4e0 Remove unnecessary PGDATABASE settings from TAP tests
Some of the TAP tests have been historically setting the environment
variable PGDATABASE to 'postgres', which is not needed because
PostgreSQL::Test::Cluster already sets it when initialized.  This commit
removes these explicit setups.

Note that the dependency of cluster -a with PGDATABASE (from
1caef31d9e) is still documented.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACXLAz5dW3ZP+Fec8g6jQMMmDyCVT+qdbye2h7QJJmhsdw@mail.gmail.com
2024-01-03 10:28:05 +09:00
Peter Eisentraut c538592959 Make all Perl warnings fatal
There are a lot of Perl scripts in the tree, mostly code generation
and TAP tests.  Occasionally, these scripts produce warnings.  These
are probably always mistakes on the developer side (true positives).
Typical examples are warnings from genbki.pl or related when you make
a mess in the catalog files during development, or warnings from tests
when they massage a config file that looks different on different
hosts, or mistakes during merges (e.g., duplicate subroutine
definitions), or just mistakes that weren't noticed because there is a
lot of output in a verbose build.

This changes all warnings into fatal errors, by replacing

    use warnings;

by

    use warnings FATAL => 'all';

in all Perl files.

Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
2023-12-29 18:20:00 +01:00
Peter Eisentraut a7ebd82b9e Fix a warning in Perl test code
The code was passing a scalar argument to node->restart(), but it was
expecting a hash, which causes a warning from Perl ("Odd number of
elements in hash assignment").

But the node->restart() function doesn't take a mode argument anyway.
This was probably copied from an incorrect comment (see commit
750c59d7ec).  The default restart mode is already "fast", so the test
should still be semantically correct without explicitly specifying the
mode.

Discussion: https://www.postgresql.org/message-id/e3f4bf1b-63d3-408a-b07e-d35a0fdf1b98@eisentraut.org
2023-12-27 17:15:26 +01:00
Michael Paquier c161ab74f7 Add PostgreSQL::Test::Cluster::advance_wal
This is a function that makes a node jump by N WAL segments, which is
something a couple of tests have been relying on for some cases related
to streaming, replication slot limits and logical decoding on standbys.
Hence, this centralizes the logic, while making it cheaper by relying on
pg_logical_emit_message() to emit WAL records before switching to a new
segment.

Author: Bharath Rupireddy
Reviewed-by: Kyotaro Horiguchi, Euler Taveira
Discussion: https://postgr.es/m/CALj2ACU3R8QFCvDewHCMKjgb2w_-CMCyd6DAK=Jb-af14da5eg@mail.gmail.com
2023-12-21 10:19:17 +09:00
Peter Eisentraut 721856ff24 Remove distprep
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-11-06 15:18:04 +01:00
Michael Paquier 96f052613f Introduce pg_stat_checkpointer
Historically, the statistics of the checkpointer have been always part
of pg_stat_bgwriter.  This commit removes a few columns from
pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent,
renamed columns (plus a new one for the reset timestamp):
- checkpoints_timed -> num_timed
- checkpoints_req -> num_requested
- checkpoint_write_time -> write_time
- checkpoint_sync_time -> sync_time
- buffers_checkpoint -> buffers_written

The fields of PgStat_CheckpointerStats and its SQL functions are renamed
to match with the new field names, for consistency.  Note that
background writer and checkpointer have been split into two different
processes in commits 806a2aee37 and bf405ba8e4.  The pgstat
structures were already split, making this change straight-forward.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
2023-10-30 09:47:16 +09:00
Michael Paquier dcd4454590 Remove unnecessary dependencies to wal_level=logical in TAP tests
A couple of TAP tests make use of wal_level=logical for nodes that do
not need to do any kind of logical decoding operations, like
subscription nodes on which changes are only applied.  This can be
confusing when reading these tests as setup examples, so let's remove
this configuration where not required (contrary to two-way logical
replication and similar more complex cases).  This simplifies the tests
a bit, making them slightly cheaper with less WAL generated overall.

Author: Hayato Kuroda
Discussion: https://postgr.es/m/TYAPR01MB5866946BCEB747ABE513ACC6F5D5A@TYAPR01MB5866.jpnprd01.prod.outlook.com
2023-10-20 10:09:27 +09:00
Alexander Korotkov e83d1b0c40 Add support event triggers on authenticated login
This commit introduces trigger on login event, allowing to fire some actions
right on the user connection.  This can be useful for logging or connection
check purposes as well as for some personalization of environment.  Usage
details are described in the documentation included, but shortly usage is
the same as for other triggers: create function returning event_trigger and
then create event trigger on login event.

In order to prevent the connection time overhead when there are no triggers
the commit introduces pg_database.dathasloginevt flag, which indicates database
has active login triggers.  This flag is set by CREATE/ALTER EVENT TRIGGER
command, and unset at connection time when no active triggers found.

Author: Konstantin Knizhnik, Mikhail Gribkov
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru
Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko
Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund
Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk
Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu
2023-10-16 03:18:22 +03:00
Michael Paquier 6c77bb42ab Replace use of stat()[7] by -s switch in TAP tests to retrieve file size
The list form of stat() is an inelegant API as it relies on the position
of the file size in the list returned in result.  Like in any other
places of the tree, replace that with a -s switch instead.

Another suggestion from Dagfinn is File::Stat, which we've been already
using for some other fields.  It really comes down to a matter of taste
to choose that over -s, and the latter is more used in the tree.

Author: Bertrand Drouvot
Reviewed-by: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/b2020df7-d0fc-4ea5-b2a9-7efc6d36b2ac@gmail.com
2023-10-03 08:27:34 +09:00
Thomas Munro becfbdd6c1 Fix edge-case for xl_tot_len broken by bae868ca.
bae868ca removed a check that was still needed.  If you had an
xl_tot_len at the end of a page that was too small for a record header,
but not big enough to span onto the next page, we'd immediately perform
the CRC check using a bogus large length.  Because of arbitrary coding
differences between the CRC implementations on different platforms,
nothing very bad happened on common modern systems.  On systems using
the _sb8.c fallback we could segfault.

Restore that check, add a new assertion and supply a test for that case.
Back-patch to 12, like bae868ca.

Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLCkTT7zYjzOxuLGahBdQ%3DMcF%3Dz5ZvrjSOnW4EDhVjT-g%40mail.gmail.com
2023-09-26 10:53:38 +13:00
Thomas Munro 91b0e85aa0 Don't use Perl pack('Q') in 039_end_of_wal.pl.
'Q' for 64 bit integers turns out not to work on 32 bit Perl, as
revealed by the build farm.  Use 'II' instead, and deal with endianness.

Back-patch to 12, like bae868ca.

Discussion: https://postgr.es/m/ZQ4r1vHcryBsSi_V%40paquier.xyz
2023-09-23 14:13:06 +12:00
Thomas Munro bae868caf2 Don't trust unvalidated xl_tot_len.
xl_tot_len comes first in a WAL record.  Usually we don't trust it to be
the true length until we've validated the record header.  If the record
header was split across two pages, previously we wouldn't do the
validation until after we'd already tried to allocate enough memory to
hold the record, which was bad because it might actually be garbage
bytes from a recycled WAL file, so we could try to allocate a lot of
memory.  Release 15 made it worse.

Since 70b4f82a4b, we'd at least generate an end-of-WAL condition if the
garbage 4 byte value happened to be > 1GB, but we'd still try to
allocate up to 1GB of memory bogusly otherwise.  That was an
improvement, but unfortunately release 15 tries to allocate another
object before that, so you could get a FATAL error and recovery could
fail.

We can fix both variants of the problem more fundamentally using
pre-existing page-level validation, if we just re-order some logic.

The new order of operations in the split-header case defers all memory
allocation based on xl_tot_len until we've read the following page.  At
that point we know that its first few bytes are not recycled data, by
checking its xlp_pageaddr, and that its xlp_rem_len agrees with
xl_tot_len on the preceding page.  That is strong evidence that
xl_tot_len was truly the start of a record that was logged.

This problem was most likely to occur on a standby, because
walreceiver.c recycles WAL files without zeroing out trailing regions of
each page.  We could fix that too, but it wouldn't protect us from rare
crash scenarios where the trailing zeroes don't make it to disk.

With reliable xl_tot_len validation in place, the ancient policy of
considering malloc failure to indicate corruption at end-of-WAL seems
quite surprising, but changing that is left for later work.

Also included is a new TAP test to exercise various cases of end-of-WAL
detection by writing contrived data into the WAL from Perl.

Back-patch to 12.  We decided not to put this change into the final
release of 11.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com> (the idea, not the code)
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17928-aa92416a70ff44a2%40postgresql.org
2023-09-23 10:26:24 +12:00
Amit Kapila e0b2eed047 Flush logical slots to disk during a shutdown checkpoint if required.
It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly flushed to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart.  Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives.  As we don't flush the latest value of confirm_flush LSN, it
may lead to processing the same changes again without this patch.

The approach taken by this patch has been suggested by Ashutosh Bapat.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato
Reviewed-by: Amit Kapila, Dilip Kumar, Michael Paquier, Ashutosh Bapat, Peter Smith, Hou Zhijie
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
2023-09-14 08:57:05 +05:30
Thomas Munro 0174c2d213 Fix instability in 031_recovery_conflict.pl.
Where the test wants a VACUUM command to generate WAL that would
conflict with a session on the standby, it could transiently fail to do
so if it couldn't acquire a cleanup lock conditionally at that moment on
the primary.  VACUUM FREEZE will wait, so use that instead.

No back-patch for now, but that will be needed if/when the test is
re-enabled in back-branches.

Suggested-by: Andres Freund <andres@anarazel.de>
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/20230812210006.ei7tutzwcr5svyt6%40awork3.anarazel.de
2023-09-07 14:38:15 +12:00
Andres Freund c66a7d75e6 Handle DROP DATABASE getting interrupted
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.

To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.

An invalid database cannot be connected to anymore, but can still be
dropped.

Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.

Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.

Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
2023-07-13 13:03:28 -07:00