Stop archive recovery if WAL generated with wal_level=minimal is found.

Previously if hot standby was enabled, archive recovery exited with
an error when it found WAL generated with wal_level=minimal.
But if hot standby was disabled, it just reported a warning and
continued in that case. Which could lead to data loss or errors
during normal operation. A warning was emitted, but users could
easily miss that and not notice this serious situation until
they encountered the actual errors.

To improve this situation, this commit changes archive recovery
so that it exits with FATAL error when it finds WAL generated with
wal_level=minimal whatever the setting of hot standby. This enables
users to notice the serious situation soon.

The FATAL error is thrown if archive recovery starts from a base
backup taken before wal_level is changed to minimal. When archive
recovery exits with the error, if users have a base backup taken
after setting wal_level to higher than minimal, they can recover
the database by starting archive recovery from that newer backup.
But note that if such backup doesn't exist, there is no easy way to
complete archive recovery, which may make the database server
unstartable and users may lose whole database. The commit adds
the note about this risk into the document.

Even in the case of unstartable database server, previously by just
disabling hot standby users could avoid the error during archive
recovery, forcibly start up the server and salvage data from it.
But note that this commit makes this procedure unavailable at all.

Author: Takamichi Osumi
Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao
Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com
This commit is contained in:
Fujii Masao 2021-04-06 22:56:51 +09:00
parent c4c393b3ec
commit 9de9294b0c
4 changed files with 106 additions and 9 deletions

View File

@ -2720,6 +2720,10 @@ include_dir 'conf.d'
data from a base backup and the WAL logs, so <literal>replica</literal> or
higher must be used to enable WAL archiving
(<xref linkend="guc-archive-mode"/>) and streaming replication.
Note that changing <varname>wal_level</varname> to
<literal>minimal</literal> makes any base backups taken before
unavailable for archive recovery and standby server, which may
lead to database loss.
</para>
<para>
In <literal>logical</literal> level, the same information is logged as

View File

@ -1745,7 +1745,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
<xref linkend="guc-wal-level"/> to <literal>minimal</literal>,
<xref linkend="guc-archive-mode"/> to <literal>off</literal>, and
<xref linkend="guc-max-wal-senders"/> to zero.
But note that changing these settings requires a server restart.
But note that changing these settings requires a server restart,
and makes any base backups taken before unavailable for archive
recovery and standby server, which may lead to database loss.
</para>
<para>

View File

@ -6403,9 +6403,10 @@ CheckRequiredParameterValues(void)
*/
if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
{
ereport(WARNING,
(errmsg("WAL was generated with wal_level=minimal, data may be missing"),
errhint("This happens if you temporarily set wal_level=minimal without taking a new base backup.")));
ereport(FATAL,
(errmsg("WAL was generated with wal_level=minimal, cannot continue recovering"),
errdetail("This happens if you temporarily set wal_level=minimal on the server."),
errhint("Use a backup taken after setting wal_level to higher than minimal.")));
}
/*
@ -6414,11 +6415,6 @@ CheckRequiredParameterValues(void)
*/
if (ArchiveRecoveryRequested && EnableHotStandby)
{
if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
ereport(ERROR,
(errmsg("hot standby is not possible because wal_level was not set to \"replica\" or higher on the primary server"),
errhint("Either set wal_level to \"replica\" on the primary, or turn off hot_standby here.")));
/* We ignore autovacuum_max_workers when we make this test. */
RecoveryRequiresIntParameter("max_connections",
MaxConnections,

View File

@ -0,0 +1,95 @@
# Test for archive recovery of WAL generated with wal_level=minimal
use strict;
use warnings;
use PostgresNode;
use TestLib;
use Test::More tests => 2;
use Time::HiRes qw(usleep);
# Initialize and start node with wal_level = replica and WAL archiving
# enabled.
my $node = get_new_node('orig');
$node->init(has_archiving => 1);
my $replica_config = q[
wal_level = replica
archive_mode = on
max_wal_senders = 10
hot_standby = off
];
$node->append_conf('postgresql.conf', $replica_config);
$node->start;
# Take backup
my $backup_name = 'my_backup';
$node->backup($backup_name);
# Restart node with wal_level = minimal and WAL archiving disabled
# to generate WAL with that setting. Note that such WAL has not been
# archived yet at this moment because WAL archiving is not enabled.
$node->append_conf(
'postgresql.conf', q[
wal_level = minimal
archive_mode = off
max_wal_senders = 0
]);
$node->restart;
# Restart node with wal_level = replica and WAL archiving enabled
# to archive WAL previously generated with wal_level = minimal.
# We ensure the WAL file containing the record indicating the change
# of wal_level to minimal is archived by checking pg_stat_archiver.
$node->append_conf('postgresql.conf', $replica_config);
$node->restart;
# Find next WAL segment to be archived
my $walfile_to_be_archived = $node->safe_psql('postgres',
"SELECT pg_walfile_name(pg_current_wal_lsn());");
# Make WAL segment eligible for archival
$node->safe_psql('postgres', 'SELECT pg_switch_wal()');
my $archive_wait_query
= "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
# Wait until the WAL segment has been archived.
$node->poll_query_until('postgres', $archive_wait_query)
or die "Timed out while waiting for WAL segment to be archived";
$node->stop;
# Initialize new node from backup, and start archive recovery. Check that
# archive recovery fails with an error when it detects the WAL record
# indicating the change of wal_level to minimal and node stops.
sub test_recovery_wal_level_minimal
{
my ($node_name, $node_text, $standby_setting) = @_;
my $recovery_node = get_new_node($node_name);
$recovery_node->init_from_backup(
$node, $backup_name,
has_restoring => 1, standby => $standby_setting);
# Use run_log instead of recovery_node->start because this test expects
# that the server ends with an error during recovery.
run_log(
['pg_ctl','-D', $recovery_node->data_dir, '-l',
$recovery_node->logfile, 'start']);
# Wait up to 180s for postgres to terminate
foreach my $i (0 .. 1800)
{
last if !-f $recovery_node->data_dir . '/postmaster.pid';
usleep(100_000);
}
# Confirm that the archive recovery fails with an expected error
my $logfile = slurp_file($recovery_node->logfile());
ok( $logfile =~
qr/FATAL: WAL was generated with wal_level=minimal, cannot continue recovering/,
"$node_text ends with an error because it finds WAL generated with wal_level=minimal");
}
# Test for archive recovery
test_recovery_wal_level_minimal('archive_recovery', 'archive recovery', 0);
# Test for standby server
test_recovery_wal_level_minimal('standby', 'standby', 1);