From 8a4237908c0fe73dd41d4d7c7a6314f17dfd7a6f Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 4 Oct 2021 14:05:20 +0900 Subject: [PATCH] Fix snapshot builds during promotion of hot standby node with 2PC Some specific logic is done at the end of recovery when involving 2PC transactions: 1) Call RecoverPreparedTransactions(), to recover the state of 2PC transactions into memory (re-acquire locks, etc.). 2) ShutdownRecoveryTransactionEnvironment(), to move back to normal operations, mainly cleaning up recovery locks and KnownAssignedXids (including any 2PC transaction tracked previously). 3) Switch XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE, which is the tipping point for any process calling RecoveryInProgress() to check if the cluster is still in recovery or not. Any snapshot taken between steps 2) and 3) would be empty, causing any transaction relying on a snapshot at this point to potentially corrupt data as there could still be some 2PC transactions to track, with RecentXmin moving backwards on successive calls to GetSnapshotData() in the same transaction. As SharedRecoveryState is the point to take into account to know if it is safe to discard KnownAssignedXids, this commit moves step 2) after step 3), so as we can never finish with empty snapshots. This exists since the introduction of hot standby, so backpatch all the way down. The window with incorrect snapshots is extremely small, but I have seen it when running 023_pitr_prepared_xact.pl, as did buildfarm member fairywren. Thomas Munro also found it independently. Special thanks to Andres Freund for taking the time to analyze this issue. Reported-by: Thomas Munro, Michael Paquier Analyzed-by: Andres Freund Discussion: https://postgr.es/m/20210422203603.fdnh3fu2mmfp2iov@alap3.anarazel.de Backpatch-through: 9.6 --- src/backend/access/transam/xlog.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f8c714b7b7..eddb13d13a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8111,13 +8111,6 @@ StartupXLOG(void) /* Reload shared-memory state for prepared transactions */ RecoverPreparedTransactions(); - /* - * Shutdown the recovery environment. This must occur after - * RecoverPreparedTransactions(), see notes for lock_twophase_recover() - */ - if (standbyState != STANDBY_DISABLED) - ShutdownRecoveryTransactionEnvironment(); - /* Shut down xlogreader */ if (readFile >= 0) { @@ -8165,6 +8158,18 @@ StartupXLOG(void) UpdateControlFile(); LWLockRelease(ControlFileLock); + /* + * Shutdown the recovery environment. This must occur after + * RecoverPreparedTransactions() (see notes in lock_twophase_recover()) + * and after switching SharedRecoveryState to RECOVERY_STATE_DONE so as + * any session building a snapshot will not rely on KnownAssignedXids as + * RecoveryInProgress() would return false at this stage. This is + * particularly critical for prepared 2PC transactions, that would still + * need to be included in snapshots once recovery has ended. + */ + if (standbyState != STANDBY_DISABLED) + ShutdownRecoveryTransactionEnvironment(); + /* * If there were cascading standby servers connected to us, nudge any wal * sender processes to notice that we've been promoted.