In the pg_rewind test suite, receive WAL fully before promoting.

If a transaction never reaches the standby, later tests find unexpected cluster state. A "tail-copy: query result matches" test failure has been the usual symptom. Among the buildfarm members having run this test suite, most have exhibited that symptom at least once. Back-patch to 9.5, where pg_rewind was introduced. Michael Paquier, reported by Christoph Berg.
2015-09-07 19:01:00 -04:00 · 2015-09-07 19:01:00 -04:00 · 582fbffb0c
parent b1e1862a12
commit 582fbffb0c
1 changed files with 8 additions and 6 deletions
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@ -222,12 +222,8 @@ recovery_target_timeline='latest'
 				   '-l', "$log_path/standby.log",
 				   '-o', "-p $port_standby", 'start');

-	# Wait until the standby has caught up with the primary, by polling
-	# pg_stat_replication.
-	my $caughtup_query =
-"SELECT pg_current_xlog_location() = replay_location FROM pg_stat_replication WHERE application_name = 'rewind_standby';";
-	poll_query_until($caughtup_query, $connstr_master)
-	  or die "Timed out while waiting for standby to catch up";
+	# The standby may have WAL to apply before it matches the primary.  That
+	# is fine, because no test examines the standby before promotion.
 }

 sub promote_standby
@ -235,6 +231,12 @@ sub promote_standby
 	#### Now run the test-specific parts to run after standby has been started
 	# up standby

+	# Wait for the standby to receive and write all WAL.
+	my $wal_received_query =
+"SELECT pg_current_xlog_location() = write_location FROM pg_stat_replication WHERE application_name = 'rewind_standby';";
+	poll_query_until($wal_received_query, $connstr_master)
+	  or die "Timed out while waiting for standby to receive and write WAL";
+
 	# Now promote slave and insert some new data on master, this will put
 	# the master out-of-sync with the standby. Wait until the standby is
 	# out of recovery mode, and is ready to accept read-write connections.