postgresql/src/test/recovery/t/023_pitr_prepared_xact.pl


# Copyright (c) 2021-2023, PostgreSQL Global Development Group

# Test for point-in-time recovery (PITR) with prepared transactions
use strict;
use warnings;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
use File::Compare;

# Initialize and start primary node with WAL archiving
my $node_primary = PostgreSQL::Test::Cluster->new('primary');
$node_primary->init(has_archiving => 1, allows_streaming => 1);
$node_primary->append_conf(
	'postgresql.conf', qq{
max_prepared_transactions = 10});
$node_primary->start;

# Take backup
my $backup_name = 'my_backup';
$node_primary->backup($backup_name);

# Initialize node for PITR targeting a very specific restore point, just
# after a PREPARE TRANSACTION is issued so as we finish with a promoted
# node where this 2PC transaction needs an explicit COMMIT PREPARED.
my $node_pitr = PostgreSQL::Test::Cluster->new('node_pitr');
$node_pitr->init_from_backup(
	$node_primary, $backup_name,
	standby       => 0,
	has_restoring => 1);
$node_pitr->append_conf(
	'postgresql.conf', qq{
recovery_target_name = 'rp'
recovery_target_action = 'promote'});

# Workload with a prepared transaction and the target restore point.
$node_primary->psql(
	'postgres', qq{
CREATE TABLE foo(i int);
BEGIN;
INSERT INTO foo VALUES(1);
PREPARE TRANSACTION 'fooinsert';
SELECT pg_create_restore_point('rp');
INSERT INTO foo VALUES(2);
});

# Find next WAL segment to be archived
my $walfile_to_be_archived = $node_primary->safe_psql('postgres',
	"SELECT pg_walfile_name(pg_current_wal_lsn());");

# Make WAL segment eligible for archival
$node_primary->safe_psql('postgres', 'SELECT pg_switch_wal()');

# Wait until the WAL segment has been archived.
my $archive_wait_query =
  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
$node_primary->poll_query_until('postgres', $archive_wait_query)
  or die "Timed out while waiting for WAL segment to be archived";
my $last_archived_wal_file = $walfile_to_be_archived;

# Now start the PITR node.
$node_pitr->start;

# Wait until the PITR node exits recovery.
$node_pitr->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
  or die "Timed out while waiting for PITR promotion";

# Commit the prepared transaction in the latest timeline and check its
# result.  There should only be one row in the table, coming from the
# prepared transaction.  The row from the INSERT after the restore point
# should not show up, since our recovery target was older than the second
# INSERT done.
$node_pitr->psql('postgres', qq{COMMIT PREPARED 'fooinsert';});
my $result = $node_pitr->safe_psql('postgres', "SELECT * FROM foo;");
is($result, qq{1}, "check table contents after COMMIT PREPARED");

# Insert more data and do a checkpoint.  These should be generated on the
# timeline chosen after the PITR promotion.
$node_pitr->psql(
	'postgres', qq{
INSERT INTO foo VALUES(3);
CHECKPOINT;
});

# Enforce recovery, the checkpoint record generated previously should
# still be found.
$node_pitr->stop('immediate');
$node_pitr->start;

done_testing();
Add a copyright notice to perl files lacking one. 2021-05-07 16:56:14 +02:00
Update copyright for 2023 Backpatch-through: 11 2023-01-02 21:00:37 +01:00			`# Copyright (c) 2021-2023, PostgreSQL Global Development Group`
Add a copyright notice to perl files lacking one. 2021-05-07 16:56:14 +02:00
Fix typos in comments The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com 2023-05-02 05:23:08 +02:00			`# Test for point-in-time recovery (PITR) with prepared transactions`
Fix timeline assignment in checkpoints with 2PC transactions Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since 1148e22a, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to 1148e22a, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10 2021-03-22 00:30:53 +01:00			`use strict;`
			`use warnings;`
Move Perl test modules to a better namespace The five modules in our TAP test framework all had names in the top level namespace. This is unwise because, even though we're not exporting them to CPAN, the names can leak, for example if they are exported by the RPM build process. We therefore move the modules to the PostgreSQL::Test namespace. In the process PostgresNode is renamed to Cluster, and TestLib is renamed to Utils. PostgresVersion becomes simply PostgreSQL::Version, to avoid possible confusion about what it's the version of. Discussion: https://postgr.es/m/aede93a4-7d92-ef26-398f-5094944c2504@dunslane.net Reviewed by Erik Rijkers and Michael Paquier 2021-10-24 16:28:19 +02:00			`use PostgreSQL::Test::Cluster;`
			`use PostgreSQL::Test::Utils;`
Replace Test::More plans with done_testing Rather than doing manual book keeping to plan the number of tests to run in each TAP suite, conclude each run with done_testing() summing up the the number of tests that ran. This removes the need for maintaning and updating the plan count at the expense of an accurate count of remaining during the test suite runtime. This patch has been discussed a number of times, often in the context of other patches which updates tests, so a larger number of discussions can be found in the archives. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/DD399313-3D56-4666-8079-88949DAC870F@yesql.se 2022-02-11 20:54:44 +01:00			`use Test::More;`
Fix timeline assignment in checkpoints with 2PC transactions Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since 1148e22a, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to 1148e22a, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10 2021-03-22 00:30:53 +01:00			`use File::Compare;`

			`# Initialize and start primary node with WAL archiving`
Move Perl test modules to a better namespace The five modules in our TAP test framework all had names in the top level namespace. This is unwise because, even though we're not exporting them to CPAN, the names can leak, for example if they are exported by the RPM build process. We therefore move the modules to the PostgreSQL::Test namespace. In the process PostgresNode is renamed to Cluster, and TestLib is renamed to Utils. PostgresVersion becomes simply PostgreSQL::Version, to avoid possible confusion about what it's the version of. Discussion: https://postgr.es/m/aede93a4-7d92-ef26-398f-5094944c2504@dunslane.net Reviewed by Erik Rijkers and Michael Paquier 2021-10-24 16:28:19 +02:00			`my $node_primary = PostgreSQL::Test::Cluster->new('primary');`
Fix new TAP test for 2PC transactions and PITRs on Windows The test added by 595b9cb forgot that on Windows it is necessary to set up pg_hba.conf (see PostgresNode::set_replication_conf) with a specific entry or base backups fail. Any node that requires to support replication just needs to pass down allows_streaming at initialization. This updates the test to do so. Simplify things a bit while on it. Per buildfarm member fairywren. Any Windows hosts running this test would have failed, and I have reproduced the problem as well. Backpatch-through: 10 2021-03-22 01:51:05 +01:00			`$node_primary->init(has_archiving => 1, allows_streaming => 1);`
Fix timeline assignment in checkpoints with 2PC transactions Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since 1148e22a, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to 1148e22a, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10 2021-03-22 00:30:53 +01:00			`$node_primary->append_conf(`
			`'postgresql.conf', qq{`
			`max_prepared_transactions = 10});`
			`$node_primary->start;`

			`# Take backup`
			`my $backup_name = 'my_backup';`
			`$node_primary->backup($backup_name);`

			`# Initialize node for PITR targeting a very specific restore point, just`
			`# after a PREPARE TRANSACTION is issued so as we finish with a promoted`
			`# node where this 2PC transaction needs an explicit COMMIT PREPARED.`
Move Perl test modules to a better namespace The five modules in our TAP test framework all had names in the top level namespace. This is unwise because, even though we're not exporting them to CPAN, the names can leak, for example if they are exported by the RPM build process. We therefore move the modules to the PostgreSQL::Test namespace. In the process PostgresNode is renamed to Cluster, and TestLib is renamed to Utils. PostgresVersion becomes simply PostgreSQL::Version, to avoid possible confusion about what it's the version of. Discussion: https://postgr.es/m/aede93a4-7d92-ef26-398f-5094944c2504@dunslane.net Reviewed by Erik Rijkers and Michael Paquier 2021-10-24 16:28:19 +02:00			`my $node_pitr = PostgreSQL::Test::Cluster->new('node_pitr');`
Fix timeline assignment in checkpoints with 2PC transactions Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since 1148e22a, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to 1148e22a, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10 2021-03-22 00:30:53 +01:00			`$node_pitr->init_from_backup(`
			`$node_primary, $backup_name,`
			`standby => 0,`
			`has_restoring => 1);`
			`$node_pitr->append_conf(`
			`'postgresql.conf', qq{`
			`recovery_target_name = 'rp'`
			`recovery_target_action = 'promote'});`

			`# Workload with a prepared transaction and the target restore point.`
			`$node_primary->psql(`
			`'postgres', qq{`
			`CREATE TABLE foo(i int);`
			`BEGIN;`
			`INSERT INTO foo VALUES(1);`
			`PREPARE TRANSACTION 'fooinsert';`
			`SELECT pg_create_restore_point('rp');`
			`INSERT INTO foo VALUES(2);`
			`});`

			`# Find next WAL segment to be archived`
			`my $walfile_to_be_archived = $node_primary->safe_psql('postgres',`
			`"SELECT pg_walfile_name(pg_current_wal_lsn());");`

			`# Make WAL segment eligible for archival`
			`$node_primary->safe_psql('postgres', 'SELECT pg_switch_wal()');`

			`# Wait until the WAL segment has been archived.`
			`my $archive_wait_query =`
			`"SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";`
			`$node_primary->poll_query_until('postgres', $archive_wait_query)`
			`or die "Timed out while waiting for WAL segment to be archived";`
			`my $last_archived_wal_file = $walfile_to_be_archived;`

			`# Now start the PITR node.`
			`$node_pitr->start;`

			`# Wait until the PITR node exits recovery.`
			`$node_pitr->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")`
			`or die "Timed out while waiting for PITR promotion";`

			`# Commit the prepared transaction in the latest timeline and check its`
			`# result. There should only be one row in the table, coming from the`
			`# prepared transaction. The row from the INSERT after the restore point`
			`# should not show up, since our recovery target was older than the second`
			`# INSERT done.`
			`$node_pitr->psql('postgres', qq{COMMIT PREPARED 'fooinsert';});`
			`my $result = $node_pitr->safe_psql('postgres', "SELECT * FROM foo;");`
			`is($result, qq{1}, "check table contents after COMMIT PREPARED");`

			`# Insert more data and do a checkpoint. These should be generated on the`
			`# timeline chosen after the PITR promotion.`
			`$node_pitr->psql(`
			`'postgres', qq{`
			`INSERT INTO foo VALUES(3);`
			`CHECKPOINT;`
			`});`

			`# Enforce recovery, the checkpoint record generated previously should`
			`# still be found.`
			`$node_pitr->stop('immediate');`
			`$node_pitr->start;`
Replace Test::More plans with done_testing Rather than doing manual book keeping to plan the number of tests to run in each TAP suite, conclude each run with done_testing() summing up the the number of tests that ran. This removes the need for maintaning and updating the plan count at the expense of an accurate count of remaining during the test suite runtime. This patch has been discussed a number of times, often in the context of other patches which updates tests, so a larger number of discussions can be found in the archives. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/DD399313-3D56-4666-8079-88949DAC870F@yesql.se 2022-02-11 20:54:44 +01:00
			`done_testing();`