postgresql/src/test/perl
Thomas Munro bae868caf2 Don't trust unvalidated xl_tot_len.
xl_tot_len comes first in a WAL record.  Usually we don't trust it to be
the true length until we've validated the record header.  If the record
header was split across two pages, previously we wouldn't do the
validation until after we'd already tried to allocate enough memory to
hold the record, which was bad because it might actually be garbage
bytes from a recycled WAL file, so we could try to allocate a lot of
memory.  Release 15 made it worse.

Since 70b4f82a4b, we'd at least generate an end-of-WAL condition if the
garbage 4 byte value happened to be > 1GB, but we'd still try to
allocate up to 1GB of memory bogusly otherwise.  That was an
improvement, but unfortunately release 15 tries to allocate another
object before that, so you could get a FATAL error and recovery could
fail.

We can fix both variants of the problem more fundamentally using
pre-existing page-level validation, if we just re-order some logic.

The new order of operations in the split-header case defers all memory
allocation based on xl_tot_len until we've read the following page.  At
that point we know that its first few bytes are not recycled data, by
checking its xlp_pageaddr, and that its xlp_rem_len agrees with
xl_tot_len on the preceding page.  That is strong evidence that
xl_tot_len was truly the start of a record that was logged.

This problem was most likely to occur on a standby, because
walreceiver.c recycles WAL files without zeroing out trailing regions of
each page.  We could fix that too, but it wouldn't protect us from rare
crash scenarios where the trailing zeroes don't make it to disk.

With reliable xl_tot_len validation in place, the ancient policy of
considering malloc failure to indicate corruption at end-of-WAL seems
quite surprising, but changing that is left for later work.

Also included is a new TAP test to exercise various cases of end-of-WAL
detection by writing contrived data into the WAL from Perl.

Back-patch to 12.  We decided not to put this change into the final
release of 11.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com> (the idea, not the code)
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17928-aa92416a70ff44a2%40postgresql.org
2023-09-23 10:26:24 +12:00
..
PostgreSQL Don't trust unvalidated xl_tot_len. 2023-09-23 10:26:24 +12:00
Makefile Add missing uninstallation rule for BackgroundPsql.pm 2023-05-02 09:41:03 +02:00
meson.build Fix missing installation rules for BackgroundPsql.pm 2023-04-26 11:40:01 +02:00
README Make PG_TEST_NOCLEAN work for temporary directories in TAP tests 2023-07-03 10:06:04 +09:00

Perl-based TAP tests
====================

src/test/perl/ contains shared infrastructure that's used by Perl-based tests
across the source tree, particularly tests in src/bin and src/test. It's used
to drive tests for backup and restore, replication, etc - anything that can't
really be expressed using pg_regress or the isolation test framework.

The tests are invoked via perl's 'prove' command, wrapped in PostgreSQL
makefiles to handle instance setup etc. See the $(prove_check) and
$(prove_installcheck) targets in Makefile.global. By default every test in the
t/ subdirectory is run. Individual test(s) can be run instead by passing
something like PROVE_TESTS="t/001_testname.pl t/002_othertestname.pl" to make.

By default, to keep the noise low during runs, we do not set any flags via
PROVE_FLAGS, but this can be done on the 'make' command line if desired, eg:

make check-world PROVE_FLAGS='--verbose'

When a test fails, the terminal output from 'prove' is usually not sufficient
to diagnose the problem.  Look into the log files that are left under
tmp_check/log/ to get more info.  Files named 'regress_log_XXX' are log
output from the perl test scripts themselves, and should be examined first.
Other files are postmaster logs, and may be helpful as additional data.

The tests default to a timeout of 180 seconds for many individual operations.
Slow hosts may avoid load-induced, spurious failures by setting environment
variable PG_TEST_TIMEOUT_DEFAULT to some number of seconds greater than 180.
Developers may see faster failures by setting that environment variable to
some lesser number of seconds.

Data directories will also be left behind for analysis when a test fails;
they are named according to the test filename.  But if the environment
variable PG_TEST_NOCLEAN is set, the data directories will be retained
regardless of test status.  This environment variable also prevents the
test's temporary directories from being removed.


Writing tests
-------------

You should prefer to write tests using pg_regress in src/test/regress, or
isolation tester specs in src/test/isolation, if possible. If not, check to
see if your new tests make sense under an existing tree in src/test, like
src/test/ssl, or should be added to one of the suites for an existing utility.

Note that all tests and test tools should have perltidy run on them before
patches are submitted, using perltidy --profile=src/tools/pgindent/perltidyrc

Tests are written using Perl's Test::More with some PostgreSQL-specific
infrastructure from src/test/perl providing node management, support for
invoking 'psql' to run queries and get results, etc. You should read the
documentation for Test::More before trying to write tests.

Test scripts in the t/ subdirectory of a suite are executed in alphabetical
order.

Each test script should begin with:

    use strict;
    use warnings;
    use PostgreSQL::Test::Cluster;
    use PostgreSQL::Test::Utils;
    use Test::More;

then it will generally need to set up one or more nodes, run commands
against them and evaluate the results. For example:

    my $node = PostgreSQL::Test::Cluster->new('primary');
    $node->init;
    $node->start;

    my $ret = $node->safe_psql('postgres', 'SELECT 1');
    is($ret, '1', 'SELECT 1 returns 1');

    $node->stop('fast');

Each test script should end with:

	done_testing();

Test::Builder::Level controls how far up in the call stack a test will look
at when reporting a failure.  This should be incremented by any subroutine
which directly or indirectly calls test routines from Test::More, such as
ok() or is():

    local $Test::Builder::Level = $Test::Builder::Level + 1;

Read the documentation for more on how to write tests:

    perldoc Test::More
    perldoc Test::Builder

For available PostgreSQL-specific test methods and some example tests read the
perldoc for the test modules, e.g.:

    perldoc src/test/perl/PostgreSQL/Test/Cluster.pm

Portability
-----------

Avoid using any bleeding-edge Perl features.  We have buildfarm animals
running Perl versions as old as 5.14, so your tests will be expected
to pass on that.

Also, do not use any non-core Perl modules except IPC::Run.  Or, if you
must do so for a particular test, arrange to skip the test when the needed
module isn't present.  If unsure, you can consult Module::CoreList to find
out whether a given module is part of the Perl core, and which module
versions shipped with which Perl releases.

One way to test for compatibility with old Perl versions is to use
perlbrew; see http://perlbrew.pl .  After installing that, do

    export PERLBREW_CONFIGURE_FLAGS='-de -Duseshrplib'
    perlbrew --force install 5.14.0
    perlbrew use 5.14.0
    perlbrew install-cpanm
    cpanm install Test::Simple@0.98
    cpanm install IPC::Run@0.79
    cpanm install ExtUtils::MakeMaker@6.50  # downgrade

TIP: if Test::Simple's utf8 regression test hangs up, try setting a
UTF8-compatible locale, e.g. "export LANG=en_US.utf8".

Then re-run Postgres' configure to ensure the correct Perl is used when
running tests.  To verify that the right Perl was found:

    grep ^PERL= config.log

Due to limitations of cpanm, this recipe doesn't exactly duplicate the
module list of older buildfarm animals.  The discrepancies should seldom
matter, but if you want to be sure, bypass cpanm and instead manually
install the desired versions of Test::Simple and IPC::Run.