Commit Graph

14 Commits

Author SHA1 Message Date
Amit Kapila
e721123b7a Fix catalog lookup with the wrong snapshot during logical decoding.
Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION
records to know if the transaction has modified the catalog, and that
information is not serialized to snapshot. Therefore, after the restart,
if the logical decoding decodes only the commit record of the transaction
that has actually modified a catalog, we will miss adding its XID to the
snapshot. Thus, we will end up looking at catalogs with the wrong
snapshot.

To fix this problem, this changes the snapshot builder so that it
remembers the last-running-xacts list of the decoded RUNNING_XACTS record
after restoring the previously serialized snapshot. Then, we mark the
transaction as containing catalog changes if it's in the list of initial
running transactions and its commit record has XACT_XINFO_HAS_INVALS. To
avoid ABI breakage, we store the array of the initial running transactions
in the static variables InitialRunningXacts and NInitialRunningXacts,
instead of storing those in SnapBuild or ReorderBuffer.

This approach has a false positive; we could end up adding the transaction
that didn't change catalog to the snapshot since we cannot distinguish
whether the transaction has catalog changes only by checking the COMMIT
record. It doesn't have the information on which (sub) transaction has
catalog changes, and XACT_XINFO_HAS_INVALS doesn't necessarily indicate
that the transaction has catalog change. But that won't be a problem since
we use snapshot built during decoding only to read system catalogs.

On the master branch, we took a more future-proof approach by writing
catalog modifying transactions to the serialized snapshot which avoids the
above false positive. But we cannot backpatch it because of a change in
the SnapBuild.

Reported-by: Mike Oh
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi
Backpatch-through: 10
Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
2022-08-11 08:55:31 +05:30
Tom Lane
94d8d8d89f Allow non-quoted identifiers as isolation test session/step names.
For no obvious reason, isolationtester has always insisted that
session and step names be written with double quotes.  This is
fairly tedious and does little for test readability, especially
since the names that people actually choose almost always look
like normal identifiers.  Hence, let's tweak the lexer to allow
SQL-like identifiers not only double-quoted strings.

(They're SQL-like, not exactly SQL, because I didn't add any
case-folding logic.  Also there's no provision for U&"..." names,
not that anyone's likely to care.)

There is one incompatibility introduced by this change: if you write
"foo""bar" with no space, that used to be taken as two identifiers,
but now it's just one identifier with an embedded quote mark.

I converted all the src/test/isolation/ specfiles to remove
unnecessary double quotes, but stopped there because my
eyes were glazing over already.

Like 741d7f104, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.

Discussion: https://postgr.es/m/759113.1623861959@sss.pgh.pa.us
2021-06-23 18:41:39 -04:00
Michael Paquier
8f32299424 Detect unused steps in isolation specs and do some cleanup
This is useful for developers to find out if an isolation spec is
over-engineered or if it needs more work by warning at the end of a
test run if a step is not used, generating a failure with extra diffs.

While on it, clean up all the specs which include steps not used in any
permutations to simplify them.

This is a backpatch of 989d23b and 06fdc4e, as it is becoming useful to
make all the branches consistent for an upcoming patch that will improve
the output generated by isolationtester.

Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz
Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us
Backpatch-through: 9.6
2021-06-17 11:57:26 +09:00
Amit Kapila
bff456d7a0 Stop demanding that top xact must be seen before subxact in decoding.
Manifested as

ERROR:  subtransaction logged without previous top-level txn record

this check forbids legit behaviours like
 - First xl_xact_assignment record is beyond reading, i.e. earlier
   restart_lsn.
 - After restart_lsn there is some change of a subxact.
 - After that, there is second xl_xact_assignment (for another subxact)
   revealing the relationship between top and first subxact.

Such a transaction won't be streamed anyway because we hadn't seen it in
full.  Saying for sure whether xact of some record encountered after
the snapshot was deserialized can be streamed or not requires to know
whether it wrote something before deserialization point --if yes, it
hasn't been seen in full and can't be decoded. Snapshot doesn't have such
info, so there is no easy way to relax the check.

Reported-by: Hsu, John
Diagnosed-by: Arseny Sher
Author: Arseny Sher, Amit Kapila
Reviewed-by: Amit Kapila, Dilip Kumar
Backpatch-through: 9.5
Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
2020-02-19 08:35:16 +05:30
Alvaro Herrera
aba2184bed Reduce cost of test_decoding's new oldest_xmin test
Change a whole-database VACUUM into doing just pg_attribute, which is
the portion that verifies what we want it to do.  The original
formulation wastes a lot of CPU time, which leads the test to fail when
runtime exceeds isolationtester timeout when it's super-slow, such as
under CLOBBER_CACHE_ALWAYS.  Per buildfarm member friarbird.

It turns out that the previous shape of the test doesn't always detect
the condition it is supposed to detect (on unpatched reorderbuffer
code): the reason is that there is a good chance of encountering a
xl_running_xacts record (logged every 15 seconds) before the checkpoint
-- and because we advance the xmin when we receive that WAL record, and
we *don't* advance the xmin twice consecutively without receiving a
client message in between, that means the xmin is not advanced enough
for the tuple to be pruned from pg_attribute by VACUUM.  So the test
would spuriously pass.

The reason this test deficiency wasn't detected earlier is that HOT
pruning removes the tuple anyway, even if vacuum leaves it in place, so
the test correctly fails (detecting the coding mistake), but for the
wrong reason.

To fix this mess, run the s0_get_changes step twice before vacuum
instead of once: this seems to cause the xmin to be advanced reliably,
wreaking havoc with more certainty.

Author: Arseny Sher
Discussion: https://postgr.es/m/87h8lkuxoa.fsf@ars-thinkpad
2018-07-05 16:40:42 -04:00
Alvaro Herrera
f49a80c481 Fix "base" snapshot handling in logical decoding
Two closely related bugs are fixed.  First, xmin of logical slots was
advanced too early.  During xl_running_xacts processing, xmin of the
slot was set to the oldest running xid in the record, but that's wrong:
actually, snapshots which will be used for not-yet-replayed transactions
might consider older txns as running too, so we need to keep xmin back
for them.  The problem wasn't noticed earlier because DDL which allows
to delete tuple (set xmax) while some another not-yet-committed
transaction looks at it is pretty rare, if not unique: e.g. all forms of
ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
conflicting with any inserts. The included test case (test_decoding's
oldest_xmin) uses ALTER of a composite type, which doesn't have such
interlocking.

To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. To
fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
transactions are sorted by base-snapshot-LSN.  This is slightly
different from the existing (sorted by first-LSN) list, because a
transaction can have an earlier LSN but a later Xmin, if its first
record does not obtain an xmin (eg. xl_xact_assignment).  Note this new
list doesn't fully replace the existing txn list: we still need that one
to prevent WAL recycling.

The second issue concerns SnapBuilder snapshots and subtransactions.
SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
transaction that is known to be a subtxn, which is good in the common
case that the top-level transaction already has one (no point in doing
so), but a bug otherwise.  To fix, arrange to transfer the snapshot from
the subtxn to its top-level txn as soon as the kinship gets known.
test_decoding's snapshot_transfer verifies this.

Also, fix a minor memory leak: refcount of toplevel's old base snapshot
was not decremented when the snapshot is transferred from child.

Liberally sprinkle code comments, and rewrite a few existing ones.  This
part is my (Álvaro's) contribution to this commit, as I had to write all
those comments in order to understand the existing code and Arseny's
patch.

Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad
2018-06-26 16:48:10 -04:00
Peter Eisentraut
325f2ec555 Handle heap rewrites even better in logical decoding
Logical decoding should not publish anything about tables created as
part of a heap rewrite during DDL.  Those tables don't exist externally,
so consumers of logical decoding cannot do anything sensible with that
information.  In ab28feae2b, we worked
around this for built-in logical replication, but that was hack.

This is a more proper fix: We mark such transient heaps using the new
field pg_class.relwrite, linking to the original relation OID.  By
default, we ignore them in logical decoding before they get to the
output plugin.  Optionally, a plugin can register their interest in
getting such changes, if they handle DDL specially, in which case the
new field will help them get information about the actual table.

Reviewed-by: Craig Ringer <craig@2ndquadrant.com>
2018-03-21 09:15:04 -04:00
Andres Freund
955a684e04 Fix race condition leading to hanging logical slot creation.
The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records.  Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.

That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions.  That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.

It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.

Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code.  That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
   we cannot use it to build the initial snapshot.  Instead we have to
   wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
   the all transaction perceived as running based on commit/abort
   records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.

Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release.  As this is going to be backpatched, we have to hack
around a bit to keep on-disk compatibility.  A later commit will
rejigger that on master.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
2017-05-13 14:21:00 -07:00
Andres Freund
1986c3c440 Force synchronous_commit=on in test_decoding's concurrent_ddl_dml.spec.
Otherwise running installcheck-force on a server with
synchronous_commit=off will result in the tests failing. All the other
tests already do so...

Backpatch: 9.4, where logical decoding was added
2016-03-03 17:22:25 -08:00
Robert Haas
5243a35825 Remove bogus step from test_decoding isolation tests.
Commit 43b4a16817 made the isolation
tester reject duplicate step names, and it turns out that the
test_decoding module's concurrent_ddl_dml isolation test has a
duplicate name.  I think the second definition isn't actually getting
used, so just remove it.

Per buildfarm.
2015-08-14 22:40:55 -04:00
Heikki Linnakangas
4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Andres Freund
ec5896aed3 Fix several weaknesses in slot and logical replication on-disk serialization.
Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and
snapbuild.c were missing the FIN_CRC32 call when computing/checking
checksums of on disk files. That doesn't lower the the error detection
capabilities of the checksum, but is inconsistent with other usages.

In a followup mail Heikki also noticed that, contrary to a comment,
the 'version' and 'length' struct fields of replication slot's on disk
data where not covered by the checksum. That's not likely to lead to
actually missed corruption as those fields are cross checked with the
expected version and the actual file length. But it's wrong
nonetheless.

As fixing these issues makes existing on disk files unreadable, bump
the expected versions of on disk files for both slots and logical
decoding historic catalog snapshots.  This means that loading old
files will fail with
ERROR: "replication slot file ... has unsupported version 1"
and
ERROR: "snapbuild state file ... has unsupported version 1 instead of
2" respectively. Given the low likelihood of anybody already using
these new features in a production setup that seems acceptable.

Fixing these issues made me notice that there's no regression test
covering the loading of historic snapshot from disk - so add one.

Backpatch to 9.4 where these features were introduced.
2014-11-12 18:52:49 +01:00
Andres Freund
d6fa44fce7 Add skip-empty-xacts option to test_decoding for use in the regression tests.
The regression tests for contrib/test_decoding regularly failed on
postgres instances that were very slow. Either because the hardware
itself was slow or because very expensive debugging options like
CLOBBER_CACHE_ALWAYS were used.

The reason they failed was just that some additional transactions were
decoded. Analyze and vacuum, triggered by autovac.

To fix just add a option to test_decoding to only display transactions
in which a change was actually displayed. That's not pretty because it
removes information from the tests; but better than constantly failing
tests in very likely harmless ways.

Backpatch to 9.4 where logical decoding was introduced.

Discussion: 20140629142511.GA26930@awork2.anarazel.de
2014-09-01 15:59:44 +02:00
Robert Haas
b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00