Commit Graph

1414 Commits

Author SHA1 Message Date
Amit Kapila 75bfe7434d Fix stale values in partition map entries on subscribers.
We build the partition map entries on subscribers while applying the
changes for update/delete on partitions. The component relation in each
entry is closed after its use so we need to update it on successive use of
cache entries.

This problem was there since the original commit f1ac27bfda that
introduced this code but we didn't notice it till the recent commit
26b3455afa started to use the component relation of partition map cache
entry.

Reported-by: Tom Lane, as per buildfarm
Author: Amit Langote, Hou Zhijie
Reviewed-by: Amit Kapila, Shi Yu
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
2022-06-21 15:39:35 +05:30
Amit Kapila 26b3455afa Fix partition table's REPLICA IDENTITY checking on the subscriber.
In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect those to succeed only when the
corresponding partition table has a primary key or has a replica
identity defined.

Fix it by checking the replica identity of the partition table while
applying changes.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
2022-06-21 08:07:43 +05:30
Amit Kapila b7658c24c7 Fix data inconsistency between publisher and subscriber.
We were not updating the partition map cache in the subscriber even when
the corresponding remote rel is changed. Due to this data was getting
incorrectly replicated for partition tables after the publisher has
changed the table schema.

Fix it by resetting the required entries in the partition map cache after
receiving a new relation mapping from the publisher.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
2022-06-16 08:45:07 +05:30
Amit Kapila 5a97b13254 Fix cache look-up failures while applying changes in logical replication.
While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
2022-06-15 09:52:12 +05:30
Tom Lane bf4717b091 Fix off-by-one loop termination condition in pg_stat_get_subscription().
pg_stat_get_subscription scanned one more LogicalRepWorker array entry
than is really allocated.  In the worst case this could lead to SIGSEGV,
if the LogicalRepCtx data structure is near the end of shared memory.
That seems quite unlikely though (thanks to the ordering of calls in
CreateSharedMemoryAndSemaphores) and we've heard no field reports of it.
A more likely misbehavior is one row of garbage data in the function's
result, but even that is not real likely because of the check that the
pid field matches some live backend.

Report and fix by Kuntal Ghosh.  This bug is old, so back-patch
to all supported branches.

Discussion: https://postgr.es/m/CAGz5QCJykEDzW6jQK6Yz7Qh_PMtD=95de_7QoocbVR2Qy8hWZA@mail.gmail.com
2022-06-07 15:34:30 -04:00
Amit Kapila fd0b9dcebd Prohibit combining publications with different column lists.
Currently, we simply combine the column lists when publishing tables on
multiple publications and that can sometimes lead to unexpected behavior.
Say, if a column is published in any row-filtered publication, then the
values for that column are sent to the subscriber even for rows that don't
match the row filter, as long as the row matches the row filter for any
other publication, even if that other publication doesn't include the
column.

The main purpose of introducing a column list is to have statically
different shapes on publisher and subscriber or hide sensitive column
data. In both cases, it doesn't seem to make sense to combine column
lists.

So, we disallow the cases where the column list is different for the same
table when combining publications. It can be later extended to combine the
column lists for selective cases where required.

Reported-by: Alvaro Herrera
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
2022-06-02 08:31:50 +05:30
Amit Kapila 0ff20288e1 Extend pg_publication_tables to display column list and row filter.
Commit 923def9a53 and 52e4f0cd47 allowed to specify column lists and row
filters for publication tables. This commit extends the
pg_publication_tables view and pg_get_publication_tables function to
display that information.

This information will be useful to users and we also need this for the
later commit that prohibits combining multiple publications with different
column lists for the same table.

Author: Hou Zhijie
Reviewed By: Amit Kapila, Alvaro Herrera, Shi Yu, Takamichi Osumi
Discussion: https://postgr.es/m/202204251548.mudq7jbqnh7r@alvherre.pgsql
2022-05-19 08:20:55 +05:30
Michael Paquier bbf7c2d9e9 Fix typo in walreceiver.c
s/primary_slotname/primary_slot_name/.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACX3=pHkCpoGG-z+O=7Gp5YZv70jmfTyGnNV7YF3SkK73g@mail.gmail.com
2022-05-18 09:06:22 +09:00
Alvaro Herrera c4f113e8fe
Clean up newlines following left parentheses
Like commit c9d2977519.
2022-05-13 23:52:35 +02:00
Peter Eisentraut 30ed71e423 Indent C code in flex and bison files
In the style of pgindent, done semi-manually.

Discussion: https://www.postgresql.org/message-id/flat/7d062ecc-7444-23ec-a159-acd8adf9b586%40enterprisedb.com
2022-05-13 07:17:29 +02:00
Andres Freund 0cf16cb8ca Don't report stats in LogicalRepApplyLoop() when in xact.
pgstat_report_stat() is only supposed to be called outside of transactions. In
5891c7a8ed I added a pgstat_report_stat() call into LogicalRepApplyLoop()'s
timeout branch. While not commonly reached inside a transaction, it is
reachable (e.g. due to network bottlenecks or the sender being stalled / slow
for some reason).

To fix, add a !IsTransactionState() check.

No test added because there's no easy way to reproduce this case without
patching the code.

Reported-By: Erik Rijkers <er@xs4all.nl>
Discussion: https://postgr.es/m/b3463b8c-2328-dcac-0136-af95715493c1@xs4all.nl
2022-05-12 18:54:26 -07:00
Andres Freund 07d683b54a Remove function declaration for function in pg_proc.
The declaration is automatically generated. Noticed when experimenting with
adding PGDLLIMPORT markers for functions.

Discussion: https://postgr.es/m/20220512164513.vaheofqp2q24l65r@alap3.anarazel.de
2022-05-12 12:39:33 -07:00
Andres Freund 09cd33f47b Add 'static' to file-local variables missing it.
Noticed when comparing the set of exported symbols without / with
-fvisibility=hidden after adding PGDLLIMPORT to intentionally exported
symbols.

Discussion: https://postgr.es/m/20220512164513.vaheofqp2q24l65r@alap3.anarazel.de
2022-05-12 12:39:33 -07:00
Tom Lane 23e7b38bfe Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.
I manually fixed a couple of comments that pgindent uglified.
2022-05-12 15:17:30 -04:00
Andres Freund b5f44225b8 Mark a few 'bbsink' related functions / variables static.
Discussion: https://postgr.es/m/20220506234924.6mxxotl3xl63db3l@alap3.anarazel.de
2022-05-12 09:11:31 -07:00
Michael Paquier 45edde037e Fix typos and grammar in code and test comments
This fixes the grammar of some comments in a couple of tests (SQL and
TAP), and in some C files.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220511020334.GH19626@telsasoft.com
2022-05-11 15:38:55 +09:00
Amit Kapila f95d53eded Fix the logical replication timeout during large transactions.
The problem is that we don't send keep-alive messages for a long time
while processing large transactions during logical replication where we
don't send any data of such transactions. This can happen when the table
modified in the transaction is not published or because all the changes
got filtered. We do try to send the keep_alive if necessary at the end of
the transaction (via WalSndWriteData()) but by that time the
subscriber-side can timeout and exit.

To fix this we try to send the keepalive message if required after
processing certain threshold of changes.

Reported-by: Fabrice Chapuis
Author: Wang wei and Amit Kapila
Reviewed By: Masahiko Sawada, Euler Taveira, Hou Zhijie, Hayato Kuroda
Backpatch-through: 10
Discussion: https://postgr.es/m/CAA5-nLARN7-3SLU_QUxfy510pmrYK6JJb=bk3hcgemAM_pAv+w@mail.gmail.com
2022-05-11 11:11:44 +05:30
Michael Paquier 59a32f0093 Fix typo in origin.c
Introduced in 5aa2350.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PsuWz6_7aCmivNU5FahgQxDUTQtc3+__XnWkBzQcfn43w@mail.gmail.com
2022-05-06 20:01:15 +09:00
John Naylor e84f82ab5c Fix SQL syntax in comment in logical/worker.c
Euler Taveira

Disussion: https://www.postgresql.org/message-id/25f95189-eef8-43c4-9d7b-419b651963c8%40www.fastmail.com
2022-04-28 09:27:32 +07:00
Michael Paquier 83cca409ed Remove duplicated word in comment of basebackup.c
Oversight in 39969e2.

Author: Martín Marqués
Discussion: https://postgr.es/m/CABeG9LviA01oHC5h=ksLUuhMyXxmZR_tftRq6q3341CMT=j=4g@mail.gmail.com
2022-04-20 11:05:34 +09:00
Amit Kapila dd4ab6fd65 Fix the check to limit sync workers.
We don't allow to invoke more sync workers once we have reached the sync
worker limit per subscription. But the check to enforce this also doesn't
allow to launch an apply worker if it gets restarted.

This code was introduced by commit de43897122 but we caught the problem
only with the test added by recent commit c91f71b9dc which started failing
occasionally in the buildfarm.

As per buildfarm.
Diagnosed-by: Amit Kapila, Masahiko Sawada, Tomas Vondra
Author: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com
	    https://postgr.es/m/f90d2b03-4462-ce95-a524-d91464e797c8@enterprisedb.com
2022-04-19 08:49:49 +05:30
Tom Lane 6fea65508a Tighten ComputeXidHorizons' handling of walsenders.
ComputeXidHorizons (nee GetOldestXmin) thought that it could identify
walsenders by checking for proc->databaseId == 0.  Perhaps that was
safe when the code was written, but it's been wrong at least since
autovacuum was invented.  Background processes that aren't connected
to any particular database, such as the autovacuum launcher and
logical replication launcher, look like that too.

This imprecision is harmful because when such a process advertises an
xmin, the result is to hold back dead-tuple cleanup in all databases,
though it'd be sufficient to hold it back in shared catalogs (which
are the only relations such a process can access).  Aside from being
generally inefficient, this has recently been seen to cause regression
test failures in the buildfarm, as a consequence of the logical
replication launcher's startup transaction preventing VACUUM from
marking pages of a user table as all-visible.

We only want that global hold-back effect for the case where a
walsender is advertising a hot standby feedback xmin.  Therefore,
invent a new PGPROC flag that says that a process' xmin should be
considered globally, and check that instead of using the incorrect
databaseId == 0 test.  Currently only a walsender sets that flag,
and only if it is not connected to any particular database.  (This is
for bug-compatibility with the undocumented behavior of the existing
code, namely that feedback sent by a client who has connected to a
particular database would not be applied globally.  I'm not sure this
is a great definition; however, such a client is capable of issuing
plain SQL commands, and I don't think we want xmins advertised for
such commands to be applied globally.  Perhaps this could do with
refinement later.)

While at it, I rewrote the comment in ComputeXidHorizons, and
re-ordered the commented-upon if-tests, to make them match up
for intelligibility's sake.

This is arguably a back-patchable bug fix, but given the lack of
complaints I think it prudent to let it age awhile in HEAD first.

Discussion: https://postgr.es/m/1346227.1649887693@sss.pgh.pa.us
2022-04-15 17:50:05 -04:00
Alvaro Herrera 24d2b2680a
Remove extraneous blank lines before block-closing braces
These are useless and distracting.  We wouldn't have written the code
with them to begin with, so there's no reason to keep them.

Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
2022-04-13 19:16:02 +02:00
Michael Paquier a4b57543ac Rename backup_compression.{c,h} to compression.{c,h}
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.

pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup.  This cleanup needs to be done
before releasing a beta of 15.  pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.

Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
2022-04-12 13:38:54 +09:00
David Rowley b0e5f02ddc Fix various typos and spelling mistakes in code comments
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
2022-04-11 20:49:41 +12:00
Michael Paquier 7597cc3083 Fix the dates of some copyright notices
0ad8032 and 4e34747 are at the origin of that.  Julien has found the one
in parse_jsontable.c, while I have spotted the rest.

Author: Julien Rouhaud, Michael Paquier
Discussion: https://postgr.es/m/20220411060838.ftnzyvflpwu6f74w@jrouhaud
2022-04-11 16:36:25 +09:00
Tomas Vondra 2c7ea57e56 Revert "Logical decoding of sequences"
This reverts a sequence of commits, implementing features related to
logical decoding and replication of sequences:

 - 0da92dc530
 - 80901b3291
 - b779d7d8fd
 - d5ed9da41d
 - a180c2b34d
 - 75b1521dae
 - 2d2232933b
 - 002c9dd97a
 - 05843b1aa4

The implementation has issues, mostly due to combining transactional and
non-transactional behavior of sequences. It's not clear how this could
be fixed, but it'll require reworking significant part of the patch.

Discussion: https://postgr.es/m/95345a19-d508-63d1-860a-f5c2f41e8d40@enterprisedb.com
2022-04-07 20:06:36 +02:00
Jeff Davis 5c279a6d35 Custom WAL Resource Managers.
Allow extensions to specify a new custom resource manager (rmgr),
which allows specialized WAL. This is meant to be used by a Table
Access Method or Index Access Method.

Prior to this commit, only Generic WAL was available, which offers
support for recovery and physical replication but not logical
replication.

Reviewed-by: Julien Rouhaud, Bharath Rupireddy, Andres Freund
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
2022-04-06 23:06:46 -07:00
Andres Freund 6f0cf87872 pgstat: remove stats_temp_directory.
With stats now being stored in shared memory, the GUC isn't needed
anymore. However, the pg_stat_tmp directory and PG_STAT_TMP_DIR define are
kept, as pg_stat_statements (and some out-of-core extensions) store data in
it.

Docs will be updated in a subsequent commit, together with the other pending
docs updates due to shared memory stats.

Author: Andres Freund <andres@anarazel.de>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220330233550.eiwsbearu6xhuqwe@alap3.anarazel.de
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-04-06 21:29:46 -07:00
Andres Freund 5891c7a8ed pgstat: store statistics in shared memory.
Previously the statistics collector received statistics updates via UDP and
shared statistics data by writing them out to temporary files regularly. These
files can reach tens of megabytes and are written out up to twice a
second. This has repeatedly prevented us from adding additional useful
statistics.

Now statistics are stored in shared memory. Statistics for variable-numbered
objects are stored in a dshash hashtable (backed by dynamic shared
memory). Fixed-numbered stats are stored in plain shared memory.

The header for pgstat.c contains an overview of the architecture.

The stats collector is not needed anymore, remove it.

By utilizing the transactional statistics drop infrastructure introduced in a
prior commit statistics entries cannot "leak" anymore. Previously leaked
statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On
systems with many small relations pgstat_vacuum_stat() could be quite
expensive.

Now that replicas drop statistics entries for dropped objects, it is not
necessary anymore to reset stats when starting from a cleanly shut down
replica.

Subsequent commits will perform some further code cleanup, adapt docs and add
tests.

Bumps PGSTAT_FILE_FORMAT_ID.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version)
Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version)
Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version)
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-06 21:29:46 -07:00
Andres Freund e41aed674f pgstat: revise replication slot API in preparation for shared memory stats.
Previously the pgstat <-> replication slots API was done with on the basis of
names. However, the upcoming move to storing stats in shared memory makes it
more convenient to use a integer as key.

Change the replication slot functions to take the slot rather than the slot
name, and expose ReplicationSlotIndex() to compute the index of an replication
slot. Special handling will be required for restarts, as the index is not
stable across restarts. For now pgstat internally still uses names.

Rename pgstat_report_replslot_{create,drop}() to
pgstat_{create,drop}_replslot() to match the functions for other kinds of
stats.

Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de
2022-04-06 18:38:24 -07:00
Andres Freund bdbd3d9064 pgstat: stats collector references in comments.
Soon the stats collector will be no more, with statistics instead getting
stored in shared memory. There are a lot of references to the stats collector
in comments. This commit replaces most of these references with "cumulative
statistics system", with the remaining ones getting replaced as part of
subsequent commits.

This is done separately from the - quite large - shared memory statistics
patch to make review easier.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
2022-04-06 13:56:06 -07:00
Stephen Frost 39969e2a1e Remove exclusive backup mode
Exclusive-mode backups have been deprecated since 9.6 (when
non-exclusive backups were introduced) due to the issues
they can cause should the system crash while one is running and
generally because non-exclusive provides a much better interface.
Further, exclusive backup mode wasn't really being tested (nor was most
of the related code- like being able to log in just to stop an exclusive
backup and the bits of the state machine related to that) and having to
possibly deal with an exclusive backup and the backup_label file
existing during pg_basebackup, pg_rewind, etc, added other complexities
that we are better off without.

This patch removes the exclusive backup mode, the various special cases
for dealing with it, and greatly simplifies the online backup code and
documentation.

Authors: David Steele, Nathan Bossart
Reviewed-by: Chapman Flack
Discussion: https://postgr.es/m/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net
https://postgr.es/m/CAHg+QDfiM+WU61tF6=nPZocMZvHDzCK47Kneyb0ZRULYzV5sKQ@mail.gmail.com
2022-04-06 14:41:03 -04:00
Amit Kapila 2d09e44d30 Improve comments for row filtering and toast interaction in logical replication.
Reported-by: Antonin Houska
Author: Amit Kapila
Reviewed-by: Antonin Houska, Ajin Cherian
Discussion: https://postgr.es/m/84638.1649152255@antos
2022-04-06 08:20:40 +05:30
David Rowley 1b0d9aa4f7 Improve the generation memory allocator
Here we make a series of improvements to the generation memory
allocator, namely:

1. Allow generation contexts to have a minimum, initial and maximum block
sizes. The standard allocator allows this already but when the generation
context was added, it only allowed fixed-sized blocks.  The problem with
fixed-sized blocks is that it's difficult to choose how large to make the
blocks.  If the chosen size is too small then we'd end up with a large
number of blocks and a large number of malloc calls. If the block size is
made too large, then memory is wasted.

2. Add support for "keeper" blocks.  This is a special block that is
allocated along with the context itself but is never freed.  Instead,
when the last chunk in the keeper block is freed, we simply mark the block
as empty to allow new allocations to make use of it.

3. Add facility to "recycle" newly empty blocks instead of freeing them
and having to later malloc an entire new block again.  We do this by
recording a single GenerationBlock which has become empty of any chunks.
When we run out of space in the current block, we check to see if there is
a "freeblock" and use that if it contains enough space for the allocation.

Author: David Rowley, Tomas Vondra
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/d987fd54-01f8-0f73-af6c-519f799a0ab8@enterprisedb.com
2022-04-04 20:53:13 +12:00
Joe Conway 9752436f04 Use has_privs_for_roles for predefined role checks: round 2
Similar to commit 6198420ad, replace is_member_of_role with
has_privs_for_role for predefined role access checks in recently
committed basebackup code. In passing fix a double-word error
in a nearby comment.

Discussion: https://postgr.es/m/flat/CAGB+Vh4Zv_TvKt2tv3QNS6tUM_F_9icmuj0zjywwcgVi4PAhFA@mail.gmail.com
2022-04-02 13:24:38 -04:00
Robert Haas 51c0d186d9 Allow parallel zstd compression when taking a base backup.
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.

When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.

Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe. Reviewed by Justin Pryzby.

Discussion: http://postgr.es/m/CA+Tgmobj6u-nWF-j=FemygUhobhryLxf9h-wJN7W-2rSsseHNA@mail.gmail.com
2022-03-30 09:41:26 -04:00
Amit Kapila d5a9d86d8f Skip empty transactions for logical replication.
The current logical replication behavior is to send every transaction to
subscriber even if the transaction is empty. This can happen because
transaction doesn't contain changes from the selected publications or all
the changes got filtered. It is a waste of CPU cycles and network
bandwidth to build/transmit these empty transactions.

This patch addresses the above problem by postponing the BEGIN message
until the first change is sent. While processing a COMMIT message, if
there was no other change for that transaction, do not send the COMMIT
message. This allows us to skip sending BEGIN/COMMIT messages for empty
transactions.

When skipping empty transactions in synchronous replication mode, we send
a keepalive message to avoid delaying such transactions.

Author: Ajin Cherian, Hou Zhijie, Euler Taveira
Reviewed-by: Peter Smith, Takamichi Osumi, Shi Yu, Masahiko Sawada, Greg Nancarrow, Vignesh C, Amit Kapila
Discussion: https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
2022-03-30 07:41:05 +05:30
Joe Conway 6198420ad8 Use has_privs_for_roles for predefined role checks
Generally if a role is granted membership to another role with NOINHERIT
they must use SET ROLE to access the privileges of that role, however
with predefined roles the membership and privilege is conflated. Fix that
by replacing is_member_of_role with has_privs_for_role for predefined
roles. Patch does not remove is_member_of_role from acl.h, but it does
add a warning not to use that function for privilege checking. Not
backpatched based on hackers list discussion.

Author: Joshua Brindle
Reviewed-by: Stephen Frost, Nathan Bossart, Joe Conway
Discussion: https://postgr.es/m/flat/CAGB+Vh4Zv_TvKt2tv3QNS6tUM_F_9icmuj0zjywwcgVi4PAhFA@mail.gmail.com
2022-03-28 15:10:04 -04:00
Robert Haas 61762426e6 Fix a few goofs in new backup compression code.
When we try to set the zstd compression level either on the client
or on the server, check for errors.

For any algorithm, on the client side, don't try to set the compression
level unless the user specified one. This was visibly broken for
zstd, which managed to set -1 rather than 0 in this case, but tidy
up the code for the other methods, too.

On the client side, if we fail to create a ZSTD_CCtx, exit after
reporting the error. Otherwise we'll dereference a null pointer.

Patch by me, reviewed by Dipesh Pandit.

Discussion: http://postgr.es/m/CA+TgmoZK3zLQUCGi1h4XZw4jHiAWtcACc+GsdJR1_Mc19jUjXA@mail.gmail.com
2022-03-28 12:19:05 -04:00
Andres Freund 91c0570a79 Don't fail for > 1 walsenders in 019_replslot_limit, add debug messages.
So far the first of the retries introduced in f28bf667f6 resolves the
issue. But I (Andres) am still suspicious that the start of the failures might
indicate a problem.

To reduce noise, stop reporting a failure if a retry resolves the problem. To
allow figuring out what causes the slow slot drop, add a few more debug
messages to ReplicationSlotDropPtr.

See also commit afdeff1052, fe0972ee5e and f28bf667f6.

Discussion: https://postgr.es/m/20220327213219.smdvfkq2fl74flow@alap3.anarazel.de
2022-03-27 22:35:42 -07:00
Tomas Vondra 923def9a53 Allow specifying column lists for logical replication
This allows specifying an optional column list when adding a table to
logical replication. The column list may be specified after the table
name, enclosed in parentheses. Columns not included in this list are not
sent to the subscriber, allowing the schema on the subscriber to be a
subset of the publisher schema.

For UPDATE/DELETE publications, the column list needs to cover all
REPLICA IDENTITY columns. For INSERT publications, the column list is
arbitrary and may omit some REPLICA IDENTITY columns. Furthermore, if
the table uses REPLICA IDENTITY FULL, column list is not allowed.

The column list can contain only simple column references. Complex
expressions, function calls etc. are not allowed. This restriction could
be relaxed in the future.

During the initial table synchronization, only columns included in the
column list are copied to the subscriber. If the subscription has
several publications, containing the same table with different column
lists, columns specified in any of the lists will be copied.

This means all columns are replicated if the table has no column list
at all (which is treated as column list with all columns), or when of
the publications is defined as FOR ALL TABLES (possibly IN SCHEMA that
matches the schema of the table).

For partitioned tables, publish_via_partition_root determines whether
the column list for the root or the leaf relation will be used. If the
parameter is 'false' (the default), the list defined for the leaf
relation is used. Otherwise, the column list for the root partition
will be used.

Psql commands \dRp+ and \d <table-name> now display any column lists.

Author: Tomas Vondra, Alvaro Herrera, Rahila Syed
Reviewed-by: Peter Eisentraut, Alvaro Herrera, Vignesh C, Ibrar Ahmed,
Amit Kapila, Hou zj, Peter Smith, Wang wei, Tang, Shi yu
Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com
2022-03-26 01:01:27 +01:00
Tomas Vondra 05843b1aa4 Minor improvements in sequence decoding code and docs
A couple minor comment improvements and code cleanups, based on
post-commit feedback to the sequence decoding patch.

Author: Amit Kapila, vignesh C
Discussion: https://postgr.es/m/aeb2ba8d-e6f4-5486-cc4c-0d4982c291cb@enterprisedb.com
2022-03-25 21:07:17 +01:00
Tomas Vondra 75b1521dae Add decoding of sequences to built-in replication
This commit adds support for decoding of sequences to the built-in
replication (the infrastructure was added by commit 0da92dc530).

The syntax and behavior mostly mimics handling of tables, i.e. a
publication may be defined as FOR ALL SEQUENCES (replicating all
sequences in a database), FOR ALL SEQUENCES IN SCHEMA (replicating
all sequences in a particular schema) or individual sequences.

To publish sequence modifications, the publication has to include
'sequence' action. The protocol is extended with a new message,
describing sequence increments.

A new system view pg_publication_sequences lists all the sequences
added to a publication, both directly and indirectly. Various psql
commands (\d and \dRp) are improved to also display publications
including a given sequence, or sequences included in a publication.

Author: Tomas Vondra, Cary Huang
Reviewed-by: Peter Eisentraut, Amit Kapila, Hannu Krosing, Andres
             Freund, Petr Jelinek
Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com
Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca
2022-03-24 18:49:27 +01:00
Robert Haas 591767150f pg_basebackup: Try to fix some failures on Windows.
Commit ffd53659c4 messed up the
mechanism that was being used to pass parameters to LogStreamerMain()
on Windows. It worked on Linux because only Windows was using threads.
Repair by moving the additional parameters added by that commit into
the 'logstreamer_param' struct.

Along the way, fix a compiler warning on builds without HAVE_LIBZ.

Discussion: http://postgr.es/m/CA+TgmoY5=AmWOtMj3v+cySP2rR=Bt6EGyF_joAq4CfczMddKtw@mail.gmail.com
2022-03-23 13:25:26 -04:00
Robert Haas 607e75e8f0 Unbreak the build.
Commit ffd53659c4 broke the build for
anyone not compiling with LZ4 and ZSTD enabled. Woops.
2022-03-23 10:22:54 -04:00
Robert Haas ffd53659c4 Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL.
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.

This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.

Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.

Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker.

Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
2022-03-23 09:19:14 -04:00
Amit Kapila 208c5d65bb Add ALTER SUBSCRIPTION ... SKIP.
This feature allows skipping the transaction on subscriber nodes.

If incoming change violates any constraint, logical replication stops
until it's resolved. Currently, users need to either manually resolve the
conflict by updating a subscriber-side database or by using function
pg_replication_origin_advance() to skip the conflicting transaction. This
commit introduces a simpler way to skip the conflicting transactions.

The user can specify LSN by ALTER SUBSCRIPTION ... SKIP (lsn = XXX),
which allows the apply worker to skip the transaction finished at
specified LSN. The apply worker skips all data modification changes within
the transaction.

Author: Masahiko Sawada
Reviewed-by: Takamichi Osumi, Hou Zhijie, Peter Eisentraut, Amit Kapila, Shi Yu, Vignesh C, Greg Nancarrow, Haiying Tang, Euler Taveira
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2022-03-22 07:11:19 +05:30
Magnus Hagander c540d37157 Fix typo in file identification
Clearly a simple copy/paste mistake when the file was created.
2022-03-21 12:35:48 +02:00
Thomas Munro 3f1ce97346 Add circular WAL decoding buffer, take II.
Teach xlogreader.c to decode the WAL into a circular buffer.  This will
support optimizations based on looking ahead, to follow in a later
commit.

 * XLogReadRecord() works as before, decoding records one by one, and
   allowing them to be examined via the traditional XLogRecGetXXX()
   macros and certain traditional members like xlogreader->ReadRecPtr.

 * An alternative new interface XLogReadAhead()/XLogNextRecord() is
   added that returns pointers to DecodedXLogRecord objects so that it's
   now possible to look ahead in the WAL stream while replaying.

 * In order to be able to use the new interface effectively while
   streaming data, support is added for the page_read() callback to
   respond to a new nonblocking mode with XLREAD_WOULDBLOCK instead of
   waiting for more data to arrive.

No direct user of the new interface is included in this commit, though
XLogReadRecord() uses it internally.  Existing code doesn't need to
change, except in a few places where it was accessing reader internals
directly and now needs to go through accessor macros.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
2022-03-18 18:45:47 +13:00
Tomas Vondra 5a07966225 Fix row filters with multiple publications
When publishing changes through a artition root, we should use the row
filter for the top-most ancestor. The relation may be added to multiple
publications, using different ancestors, and 52e4f0cd47 handled this
incorrectly. With c91f71b9dc we find the correct top-most ancestor, but
the code tried to fetch the row filter from all publications, including
those using a different ancestor etc. No row filter can be found for
such publications, which was treated as replicating all rows.

Similarly to c91f71b9dc, this seems to be a rare issue in practice. It
requires multiple publications including the same partitioned relation,
through different ancestors.

Fixed by only passing publications containing the top-most ancestor to
pgoutput_row_filter_init(), so that treating a missing row filter as
replicating all rows is correct.

Report and fix by me, test case by Hou zj. Reviews and improvements by
Amit Kapila.

Author: Tomas Vondra, Hou zj, Amit Kapila
Reviewed-by: Amit Kapila, Hou zj
Discussion: https://postgr.es/m/d26d24dd-2fab-3c48-0162-2b7f84a9c893%40enterprisedb.com
2022-03-17 17:03:48 +01:00
Tomas Vondra c91f71b9dc Fix publish_as_relid with multiple publications
Commit 83fd4532a7 allowed publishing of changes via ancestors, for
publications defined with publish_via_partition_root. But the way
the ancestor was determined in get_rel_sync_entry() was incorrect,
simply updating the same variable. So with multiple publications,
replicating different ancestors, the outcome depended on the order
of publications in the list - the value from the last loop was used,
even if it wasn't the top-most ancestor.

This is a probably rare situation, as in most cases publications do
not overlap, so each partition has exactly one candidate ancestor
to replicate as and there's no ambiguity.

Fixed by tracking the "ancestor level" for each publication, and
picking the top-most ancestor. Adds a test case, verifying the
correct ancestor is used for publishing the changes and that this
does not depend on order of publications in the list.

Older releases have another bug in this loop - once all actions are
replicated, the loop is terminated, on the assumption that inspecting
additional publications is unecessary. But that misses the fact that
those additional applications may replicate different ancestors.

Fixed by removal of this break condition. We might still terminate the
loop in some cases (e.g. when replicating all actions and the ancestor
is the partition root).

Backpatch to 13, where publish_via_partition_root was introduced.

Initial report and fix by me, test added by Hou zj. Reviews and
improvements by Amit Kapila.

Author: Tomas Vondra, Hou zj, Amit Kapila
Reviewed-by: Amit Kapila, Hou zj
Discussion: https://postgr.es/m/d26d24dd-2fab-3c48-0162-2b7f84a9c893%40enterprisedb.com
2022-03-16 18:05:58 +01:00
Robert Haas d0083c1d2a Suppress compiler warnings.
Michael Paquier

Discussion: http://postgr.es/m/YjGvq4zPDT6j15go@paquier.xyz
2022-03-16 09:26:48 -04:00
Robert Haas 8ef1fa3ee0 Remove accidentally-committed file. 2022-03-15 13:41:36 -04:00
Robert Haas e4ba69f3f4 Allow extensions to add new backup targets.
Commit 3500ccc39b allowed for base backup
targets, meaning that we could do something with the backup other than
send it to the client, but all of those targets had to be baked in to
the core code. This commit makes it possible for extensions to define
additional backup targets.

Patch by me, reviewed by Abhijit Menon-Sen.

Discussion: http://postgr.es/m/CA+TgmoaqvdT-u3nt+_kkZ7bgDAyqDB0i-+XOMmr5JN2Rd37hxw@mail.gmail.com
2022-03-15 13:22:04 -04:00
Robert Haas 75eae09087 Change HAVE_LIBLZ4 and HAVE_LIBZSTD tests to USE_LZ4 and USE_ZSTD.
These tests were added recently, but older code tests USE_LZ4 rathr
than HAVE_LIBLZ4, so let's follow the established precedent. It
also seems more consistent with the intent of the configure tests,
since I think that the USE_* symbols are intended to correspond to
what the user requested, and the HAVE_* symbols to what configure
found while probing.

Discussion: http://postgr.es/m/CA+Tgmoap+hTD2-QNPJLH4tffeFE8MX5+xkbFKMU3FKBy=ZSNKA@mail.gmail.com
2022-03-15 13:06:25 -04:00
Amit Kapila 695f459f17 Fix compiler warning introduced in commit 705e20f855.
Reported-by: Nathan Bossart
Author: Nathan Bossart
Reviewed-by: Osumi Takamichi
Discussion : https://postgr.es/m/20220314230424.GA1085716@nathanxps13
2022-03-15 08:11:17 +05:30
Michael Paquier 6bdf1a1400 Fix collection of typos in the code and the documentation
Some words were duplicated while other places were grammatically
incorrect, including one variable name in the code.

Author: Otto Kekalainen, Justin Pryzby
Discussion: https://postgr.es/m/7DDBEFC5-09B6-4325-B942-B563D1A24BDC@amazon.com
2022-03-15 11:29:35 +09:00
Amit Kapila 705e20f855 Optionally disable subscriptions on error.
Logical replication apply workers for a subscription can easily get stuck
in an infinite loop of attempting to apply a change, triggering an error
(such as a constraint violation), exiting with the error written to the
subscription server log, and restarting.

To partially remedy the situation, this patch adds a new subscription
option named 'disable_on_error'. To be consistent with old behavior, this
option defaults to false. When true, both the tablesync worker and apply
worker catch any errors thrown and disable the subscription in order to
break the loop. The error is still also written in the logs.

Once the subscription is disabled, users can either manually resolve the
conflict/error or skip the conflicting transaction by using
pg_replication_origin_advance() function. After resolving the conflict,
users need to enable the subscription to allow apply process to proceed.

Author: Osumi Takamichi and Mark Dilger
Reviewed-by: Greg Nancarrow, Vignesh C, Amit Kapila, Wang wei, Tang Haiying, Peter Smith, Masahiko Sawada, Shi Yu
Discussion : https://postgr.es/m/DB35438F-9356-4841-89A0-412709EBD3AB%40enterprisedb.com
2022-03-14 09:32:40 +05:30
Robert Haas 1d4be6be65 Fix LZ4 tests for remaining buffer space.
We should flush the buffer when the remaining space is less than
the maximum amount that we might need, not when it is less than or
equal to the maximum amount we might need.

Jeevan Ladhe, per an observation from me.

Discussion: http://postgr.es/m/CANm22CgVMa85O1akgs+DOPE8NSrT1zbz5_vYfS83_r+6nCivLQ@mail.gmail.com
2022-03-08 10:05:55 -05:00
Robert Haas 7cf085f077 Add support for zstd base backup compression.
Both client-side compression and server-side compression are now
supported for zstd. In addition, a backup compressed by the server
using zstd can now be decompressed by the client in order to
accommodate the use of -Fp.

Jeevan Ladhe, with some edits by me.

Discussion: http://postgr.es/m/CA+Tgmobyzfbz=gyze2_LL1ZumZunmaEKbHQxjrFkOR7APZGu-g@mail.gmail.com
2022-03-08 09:52:43 -05:00
Amit Kapila d3e8368c4b Add the additional information to the logical replication worker errcontext.
This commits adds both the finish LSN (commit_lsn in case transaction got
committed, prepare_lsn in case of a prepared transaction, etc.) and
replication origin name to the existing error context message.

This will help users in specifying the origin name and transaction finish
LSN to pg_replication_origin_advance() SQL function to skip a particular
transaction.

Author: Masahiko Sawada
Reviewed-by: Takamichi Osumi, Euler Taveira, and Amit Kapila
Discussion: https://postgr.es/m/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com
2022-03-08 08:08:32 +05:30
Tomas Vondra d5ed9da41d Call ReorderBufferProcessXid from sequence_decode
Commit 0da92dc530 added sequence_decode() implementing logical decoding
of sequences, but it failed to call ReorderBufferProcessXid() as it
should. So add the missing call.

Reported-by: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KGn6cQqJEsubOOENwQOANsExiV2sKL52r4U10J8NJEMQ%40mail.gmail.com
2022-03-07 20:53:16 +01:00
Amit Kapila 5e0e99a80b Make the errcontext message in logical replication worker translation friendly.
Previously, the message for logical replication worker errcontext is
incrementally built, which was not translation friendly.  Instead, we use
complete sentences with if-else branches.

We also remove the commit timestamp from the context message since it's
not important information and made the message long.

Author: Masahiko Sawada
Reviewed-by: Takamichi Osumi, and Amit Kapila
Discussion: https://postgr.es/m/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com
2022-03-07 08:33:58 +05:30
Michael Paquier 9e98583898 Create routine able to set single-call SRFs for Materialize mode
Set-returning functions that use the Materialize mode, creating a
tuplestore to include all the tuples returned in a set rather than doing
so in multiple calls, use roughly the same set of steps to prepare
ReturnSetInfo for this job:
- Check if ReturnSetInfo supports returning a tuplestore and if the
materialize mode is enabled.
- Create a tuplestore for all the tuples part of the returned set in the
per-query memory context, stored in ReturnSetInfo->setResult.
- Build a tuple descriptor mostly from get_call_result_type(), then
stored in ReturnSetInfo->setDesc.  Note that there are some cases where
the SRF's tuple descriptor has to be the one specified by the function
caller.

This refactoring is done so as there are (well, should be) no behavior
changes in any of the in-core functions refactored, and the centralized
function that checks and sets up the function's ReturnSetInfo can be
controlled with a set of bits32 options.  Two of them prove to be
necessary now:
- SRF_SINGLE_USE_EXPECTED to use expectedDesc as tuple descriptor, as
expected by the function's caller.
- SRF_SINGLE_BLESS to validate the tuple descriptor for the SRF.

The same initialization pattern is simplified in 28 places per my
count as of src/backend/, shaving up to ~900 lines of code.  These
mostly come from the removal of the per-query initializations and the
sanity checks now grouped in a single location.  There are more
locations that could be simplified in contrib/, that are left for a
follow-up cleanup.

fcc2817, 07daca5 and d61a361 have prepared the areas of the code related
to this change, to ease this refactoring.

Author: Melanie Plageman, Michael Paquier
Reviewed-by: Álvaro Herrera, Justin Pryzby
Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com
2022-03-07 10:26:29 +09:00
Amit Kapila 7a85073290 Reconsider pg_stat_subscription_workers view.
It was decided (refer to the Discussion link below) that the stats
collector is not an appropriate place to store the error information of
subscription workers.

This patch changes the pg_stat_subscription_workers view (introduced by
commit 8d74fc96db) so that it stores only statistics counters:
apply_error_count and sync_error_count, and has one entry for
each subscription. The removed error information such as error-XID and
the error message would be stored in another way in the future which is
more reliable and persistent.

After removing these error details, there is no longer any relation
information, so the subscription statistics are now a cluster-wide
statistics.

The patch also changes the view name to pg_stat_subscription_stats since
the word "worker" is an implementation detail that we use one worker for
one tablesync and one apply.

Author: Masahiko Sawada, based on suggestions by Andres Freund
Reviewed-by: Peter Smith, Haiying Tang, Takamichi Osumi, Amit Kapila
Discussion: https://postgr.es/m/20220125063131.4cmvsxbz2tdg6g65@alap3.anarazel.de
2022-03-01 06:17:52 +05:30
Andres Freund d33aeefd9b Fix warning on mingw due to pid_t width, introduced in fe0972ee5e. 2022-02-26 16:07:07 -08:00
Amit Kapila a89850a57e Fix typo in logicalfuncs.c.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACX1mVtw8LWEnZgnpPdk2bPFR1xX2ZN+8GfXCffyip_9=Q@mail.gmail.com
2022-02-26 10:38:37 +05:30
Andres Freund fe0972ee5e Add further debug info to help debug 019_replslot_limit.pl failures.
See also afdeff1052. Failures after that commit provided a few more hints,
but not yet enough to understand what's going on.

In 019_replslot_limit.pl shut down nodes with fast instead of immediate mode
if we observe the failure mode. That should tell us whether the failures we're
observing are just a timing issue under high load. PGCTLTIMEOUT should prevent
buildfarm animals from hanging endlessly.

Also adds a bit more logging to replication slot drop and ShutdownPostgres().

Discussion: https://postgr.es/m/20220225192941.hqnvefgdzaro6gzg@alap3.anarazel.de
2022-02-25 17:04:39 -08:00
Andres Freund afdeff1052 Add temporary debug info to help debug 019_replslot_limit.pl failures.
I have not been able to reproduce the occasional failures of
019_replslot_limit.pl we are seeing in the buildfarm and not for lack of
trying. The additional logging and increased log level will hopefully help.

Will be reverted once the cause is identified.

Discussion: https://postgr.es/m/20220218231415.c4plkp4i3reqcwip@alap3.anarazel.de
2022-02-22 18:02:34 -08:00
Amit Kapila 52e4f0cd47 Allow specifying row filters for logical replication of tables.
This feature adds row filtering for publication tables. When a publication
is defined or modified, an optional WHERE clause can be specified. Rows
that don't satisfy this WHERE clause will be filtered out. This allows a
set of tables to be partially replicated. The row filter is per table. A
new row filter can be added simply by specifying a WHERE clause after the
table name. The WHERE clause must be enclosed by parentheses.

The row filter WHERE clause for a table added to a publication that
publishes UPDATE and/or DELETE operations must contain only columns that
are covered by REPLICA IDENTITY. The row filter WHERE clause for a table
added to a publication that publishes INSERT can use any column. If the
row filter evaluates to NULL, it is regarded as "false". The WHERE clause
only allows simple expressions that don't have user-defined functions,
user-defined operators, user-defined types, user-defined collations,
non-immutable built-in functions, or references to system columns. These
restrictions could be addressed in the future.

If you choose to do the initial table synchronization, only data that
satisfies the row filters is copied to the subscriber. If the subscription
has several publications in which a table has been published with
different WHERE clauses, rows that satisfy ANY of the expressions will be
copied. If a subscriber is a pre-15 version, the initial table
synchronization won't use row filters even if they are defined in the
publisher.

The row filters are applied before publishing the changes. If the
subscription has several publications in which the same table has been
published with different filters (for the same publish operation), those
expressions get OR'ed together so that rows satisfying any of the
expressions will be replicated.

This means all the other filters become redundant if (a) one of the
publications have no filter at all, (b) one of the publications was
created using FOR ALL TABLES, (c) one of the publications was created
using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema.

If your publication contains a partitioned table, the publication
parameter publish_via_partition_root determines if it uses the partition's
row filter (if the parameter is false, the default) or the root
partitioned table's row filter.

Psql commands \dRp+ and \d <table-name> will display any row filters.

Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian
Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang
Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 08:11:50 +05:30
Amit Kapila c476f380e2 Fix a comment in worker.c.
The comment incorrectly states that worker gets killed during
ALTER SUBSCRIPTION ... DISABLE. Remove that part of the comment.

Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoCbEN==oH7BhP3U6WPHg3zgH6sDOeKhJjy4W2dx-qoVCw@mail.gmail.com
2022-02-18 07:46:51 +05:30
Michael Paquier d61a361d1a Remove all traces of tuplestore_donestoring() in the C code
This routine is a no-op since dd04e95 from 2003, with a macro kept
around for compatibility purposes.  This has led to the same code
patterns being copy-pasted around for no effect, sometimes in confusing
ways like in pg_logical_slot_get_changes_guts() from logical.c where the
code was actually incorrect.

This issue has been discussed on two different threads recently, so
rather than living with this legacy, remove any uses of this routine in
the C code to simplify things.  The compatibility macro is kept to avoid
breaking any out-of-core modules that depend on it.

Reported-by: Tatsuhito Kasahara, Justin Pryzby
Author: Tatsuhito Kasahara
Discussion: https://postgr.es/m/20211217200419.GQ17618@telsasoft.com
Discussion: https://postgr.es/m/CAP0=ZVJeeYfAeRfmzqAF2Lumdiv4S4FewyBnZd4DPTrsSQKJKw@mail.gmail.com
2022-02-17 09:52:02 +09:00
Heikki Linnakangas 70e81861fa Split xlog.c into xlog.c and xlogrecovery.c.
This moves the functions related to performing WAL recovery into the new
xlogrecovery.c source file, leaving xlog.c responsible for maintaining
the WAL buffers, coordinating the startup and switch from recovery to
normal operations, and other miscellaneous stuff that have always been in
xlog.c.

Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas
Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
2022-02-16 09:30:38 +02:00
Andres Freund 2f6501fa3c Move replication slot release to before_shmem_exit().
Previously, replication slots were released in ProcKill() on error, resulting
in reporting replication slot drop of ephemeral slots after the stats
subsystem was already shut down.

To fix this problem, move replication slot release to a before_shmem_exit()
hook that is called before the stats collector shuts down. There wasn't really
a good reason for the slot handling to be in ProcKill() anyway.

Patch by Masahiko Sawada, with very minor polishing by me.

I, Andres, wrote a test for dropping slots during process exit, but there may
be some OS dependent issues around the number of times FATAL error messages
are displayed due to a still debated libpq issue. So that test will be
committed separately / later.

Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
2022-02-14 17:08:17 -08:00
Peter Eisentraut cfc7191dfe Move scanint8() to numutils.c
Move scanint8() to numutils.c and rename to pg_strtoint64().  We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent.  The API is also changed to no longer provide the errorOK
case.  Users that need the error checking can use strtoi64().

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
2022-02-14 21:57:26 +01:00
Tom Lane 302612a6c7 Silence minor compiler warnings.
Depending on compiler version and optimization level, we might
get a complaint that lazy_scan_heap's "freespace" is used
uninitialized.

Compilers not aware that ereport(ERROR) doesn't return complained
about bbsink_lz4_new().

Assigning "-1" to a uint64 value has unportable results; fortunately,
the value of xlogreadsegno is unimportant when xlogreadfd is -1.
(It looks to me like there is no need for xlogreadsegno to be static
in the first place, but I didn't venture to change that.)
2022-02-13 13:06:55 -05:00
Robert Haas dab298471f Add suport for server-side LZ4 base backup compression.
LZ4 compression can be a lot faster than gzip compression, so users
may prefer it even if the compression ratio is not as good. We will
want pg_basebackup to support LZ4 compression and decompression on the
client side as well, and there is a pending patch for that, but it's
by a different author, so I am committing this part separately for
that reason.

Jeevan Ladhe, reviewed by Tushar Ahuja and by me.

Discussion: http://postgr.es/m/CANm22Cg9cArXEaYgHVZhCnzPLfqXCZLAzjwTq7Fc0quXRPfbxA@mail.gmail.com
2022-02-11 08:29:38 -05:00
Tomas Vondra 0da92dc530 Logical decoding of sequences
This extends the logical decoding to also decode sequence increments.
We differentiate between sequences created in the current (in-progress)
transaction, and sequences created earlier. This mixed behavior is
necessary because while sequences are not transactional (increments are
not subject to ROLLBACK), relfilenode changes are. So we do this:

* Changes for sequences created in the same top-level transaction are
  treated as transactional, i.e. just like any other change from that
  transaction, and discarded in case of a rollback.

* Changes for sequences created earlier are applied immediately, as if
  performed outside any transaction. This applies also after ALTER
  SEQUENCE, which may create a new relfilenode.

Moreover, if we ever get support for DDL replication, the sequence
won't exist until the transaction gets applied.

Sequences created in the current transaction are tracked in a simple
hash table, identified by a relfilenode. That means a sequence may
already exist, but if a transaction does ALTER SEQUENCE then the
increments for the new relfilenode will be treated as transactional.

For each relfilenode we track the XID of (sub)transaction that created
it, which is needed for cleanup at transaction end. We don't need to
check the XID to decide if an increment is transactional - if we find a
match in the hash table, it has to be the same transaction.

This requires two minor changes to WAL-logging. Firstly, we need to
ensure the sequence record has a valid XID - until now the the increment
might have XID 0 if it was the first change in a subxact. But the
sequence might have been created in the same top-level transaction. So
we ensure the XID is assigned when WAL-logging increments.

The other change is addition of "created" flag, marking increments for
newly created relfilenodes. This makes it easier to maintain the hash
table of sequences that need transactional handling.
Note: This is needed because of subxacts. A XID 0 might still have the
sequence created in a different subxact of the same top-level xact.

This does not include any changes to test_decoding and/or the built-in
replication - those will be committed in separate patches.

A patch adding decoding of sequences was originally submitted by Cary
Huang. This commit reworks various important aspects (e.g. the WAL
logging and transactional/non-transactional handling). However, the
original patch and reviews were very useful.

Author: Tomas Vondra, Cary Huang
Reviewed-by: Peter Eisentraut, Hannu Krosing, Andres Freund
Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com
Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca
2022-02-10 18:43:51 +01:00
Robert Haas 0d4513b613 Remove server support for the previous base backup protocol.
Commit cc333f3233 added a new COPY
sub-protocol for taking base backups, but retained support for the
previous protocol. For the same reasons articulated in the message
for commit 9cd28c2e5f, remove support
for the previous protocol from the server.

Discussion: http://postgr.es/m/CA+TgmoazKcKUWtqVa0xZqSzbKgTH+X-aw4V7GyLD68EpDLMh8A@mail.gmail.com
2022-02-10 12:12:43 -05:00
Robert Haas 9cd28c2e5f Remove server support for old BASE_BACKUP command syntax.
Commit 0ba281cb4b added a new syntax
for the BASE_BACKUP command, with extensible options, but maintained
support for the legacy syntax. This isn't important for PostgreSQL,
where pg_basebackup works with older server versions but not newer
ones, but it could in theory matter for out-of-core users of the
replication protocol.

Discussion on pgsql-hackers, however, suggests that no one is aware
of any out-of-core use of the BASE_BACKUP command, and the consensus
is in favor of removing support for the old syntax to simplify the
code, so do that.

Discussion: http://postgr.es/m/CA+TgmoazKcKUWtqVa0xZqSzbKgTH+X-aw4V7GyLD68EpDLMh8A@mail.gmail.com
2022-02-10 10:48:33 -05:00
Amit Kapila 7f481b8d38 Improve invalidation handling in pgoutput.c.
Fix the following issues in pgoutput.c:

* rel_sync_cache_relation_cb does the wrong thing when called for a cache
flush (i.e., relid == 0). Instead of invalidating all RelationSyncCache
entries as it should, it does nothing.

* When rel_sync_cache_relation_cb does invalidate an entry, it immediately
zaps the entry->map structure, even though that might still be in use. We
instead just mark the entry as invalid and rebuild it at a later safe
point.

* Similarly, rel_sync_cache_publication_cb is way too eager to reset the
pubactions flags, which would likely lead to failing to transmit changes
that we should transmit. In this case also, we just mark the entry as
invalid and rebuild it at a later safe point.

Author: Tom Lane
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/885288.1641420714@sss.pgh.pa.us
2022-02-04 07:30:40 +05:30
Robert Haas 8e2b6d45a0 Fix server crash bug in 'server' backup target.
When this code executed as superuser it appeared to work because no
system catalog lookups happened, but otherwise it crashes because there
is no transaction environment. Fix that.

Report and code change by me. Test case by Dagfinn Ilmari Mannsåker.

Discussion: http://postgr.es/m/CA+TgmobiKLXne-2AVzYyWRiO8=rChBQ=7ywoxp=2SmcFw=oDDw@mail.gmail.com
2022-02-02 13:50:33 -05:00
Alvaro Herrera b3d7d6e462
Remove xloginsert.h from xlog.h
xlog.h is directly and indirectly #included in a lot of places.  With
this change, xloginsert.h is no longer unnecessarily included in the
large number of them that don't need it.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
2022-01-30 12:25:24 -03:00
Robert Haas 7f6772317b Adjust server-side backup to depend on pg_write_server_files.
I had made it depend on superuser, but that seems clearly inferior.
Also document the permissions requirement in the straming replication
protocol section of the documentation, rather than only in the
section having to do with pg_basebackup.

Idea and patch from Dagfinn Ilmari Mannsåker.

Discussion: http://postgr.es/m/87bkzw160u.fsf@wibble.ilmari.org
2022-01-28 12:31:40 -05:00
Robert Haas 71cbbbbe80 pg_basebackup: Add a dummy return to bbsink_gzip_new().
Apparently, this is needed to avoid warnings on MVCC.

David Rowley

Discussion: http://postgr.es/m/CAApHDvosHkgyo_PZs7CSB4Kgs2ey4FdmFpcK0N_QOci9DJ=wnw@mail.gmail.com
2022-01-27 14:20:18 -05:00
Michael Paquier 410aa248e5 Fix various typos, grammar and code style in comments and docs
This fixes a set of issues that have accumulated over the past months
(or years) in various code areas.  Most fixes are related to some recent
additions, as of the development of v15.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
2022-01-25 09:40:04 +09:00
Tom Lane 6aa5186146 Fix limitations on what SQL commands can be issued to a walsender.
In logical replication mode, a WalSender is supposed to be able
to execute any regular SQL command, as well as the special
replication commands.  Poor design of the replication-command
parser caused it to fail in various cases, notably:

* semicolons embedded in a command, or multiple SQL commands
sent in a single message;

* dollar-quoted literals containing odd numbers of single
or double quote marks;

* commands starting with a comment.

The basic problem here is that we're trying to run repl_scanner.l
across the entire input string even when it's not a replication
command.  Since repl_scanner.l does not understand all of the
token types known to the core lexer, this is doomed to have
failure modes.

We certainly don't want to make repl_scanner.l as big as scan.l,
so instead rejigger stuff so that we only lex the first token of
a non-replication command.  That will usually look like an IDENT
to repl_scanner.l, though a comment would end up getting reported
as a '-' or '/' single-character token.  If the token is a replication
command keyword, we push it back and proceed normally with repl_gram.y
parsing.  Otherwise, we can drop out of exec_replication_command()
without examining the rest of the string.

(It's still theoretically possible for repl_scanner.l to fail on
the first token; but that could only happen if it's an unterminated
single- or double-quoted string, in which case you'd have gotten
largely the same error from the core lexer too.)

In this way, repl_gram.y isn't involved at all in handling general
SQL commands, so we can get rid of the SQLCmd node type.  (In
the back branches, we can't remove it because renumbering enum
NodeTag would be an ABI break; so just leave it sit there unused.)

I failed to resist the temptation to clean up some other sloppy
coding in repl_scanner.l while at it.  The only externally-visible
behavior change from that is it now accepts \r and \f as whitespace,
same as the core lexer.

Per bug #17379 from Greg Rychlewski.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
2022-01-24 15:33:38 -05:00
Robert Haas 0ad8032910 Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.

To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.

At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.

Along the way, I removed the information message added by commit
5c649fe153 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.

Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4b.

This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b) and for
pg_verifybackup itself.

Patch by me, with a bug fix by Jeevan Ladhe.  The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 15:13:18 -05:00
Tom Lane 3c06ec6d14 Remember to reset yy_start state when firing up repl_scanner.l.
Without this, we get odd behavior when the previous cycle of
lexing exited in a non-default exclusive state.  Every other
copy of this code is aware that it has to do BEGIN(INITIAL),
but repl_scanner.l did not get that memo.

The real-world impact of this is probably limited, since most
replication clients would abandon their connection after getting
a syntax error.  Still, it's a bug.

This mistake is old, so back-patch to all supported branches.

Discussion: https://postgr.es/m/1874781.1643035952@sss.pgh.pa.us
2022-01-24 12:09:46 -05:00
Tom Lane 353708e1fb Clean up recent Coverity complaints.
Commit 5c649fe15 introduced a memory leak into pg_basebackup's
parse_compress_options.  (I simplified nearby code while at it.)

Commit 9a974cbcb introduced a memory leak into pg_dump's
binary_upgrade_set_pg_class_oids.

Coverity also complained about a call of SnapBuildProcessChange that
ignored the result, unlike every other call of that function.  This
is evidently intentional, so add a (void) cast to indicate that.
(It's also old, dating to b89e15105; I suppose the reason it showed
up now is 7a5f6b474's recent rearrangement of nearby code.)
2022-01-23 12:51:38 -05:00
Robert Haas 3500ccc39b Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command.  If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.

On the server side, we now support two additional types of backup
targets.  There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.

Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.

Patch by me, with a bug fix by Jeevan Ladhe.  The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-20 10:46:33 -05:00
Jeff Davis 7a5f6b4748 Make logical decoding a part of the rmgr.
Add a new rmgr method, rm_decode, and use that rather than a switch
statement.

In preparation for rmgr extensibility.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
2022-01-19 14:58:49 -08:00
Robert Haas 0f47e833bf Fix alignment problem with bbsink_copystream buffer.
bbsink_copystream wants to store a type byte just before the buffer,
but basebackup.c wants the buffer to be aligned so that it can call
PageIsNew() and PageGetLSN() on it. Therefore, instead of inserting
1 extra byte before the buffer, insert MAXIMUM_ALIGNOF extra bytes
and only use the last one.

On most machines this doesn't cause any problem (except perhaps for
performance) but some buildfarm machines with -fsanitize=alignment
dump core.

Discussion: http://postgr.es/m/CA+TgmoYx5=1A2K9JYV-9zdhyokU4KKTyNQ9q7CiXrX=YBBMWVw@mail.gmail.com
2022-01-19 08:12:08 -05:00
Robert Haas cc333f3233 Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.

The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.

The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.

Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 13:47:49 -05:00
Peter Eisentraut 941460fcf7 Add Boolean node
Before, SQL-level boolean constants were represented by a string with
a cast, and internal Boolean values in DDL commands were usually
represented by Integer nodes.  This takes the place of both of these
uses, making the intent clearer and having some amount of type safety.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8c1a2e37-c68d-703c-5a83-7a6077f4f997@enterprisedb.com
2022-01-17 10:38:23 +01:00
Michael Paquier b69aba7457 Improve error handling of cryptohash computations
The existing cryptohash facility was causing problems in some code paths
related to MD5 (frontend and backend) that relied on the fact that the
only type of error that could happen would be an OOM, as the MD5
implementation used in PostgreSQL ~13 (the in-core implementation is
used when compiling with or without OpenSSL in those older versions),
could fail only under this circumstance.

The new cryptohash facilities can fail for reasons other than OOMs, like
attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to
1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this
would cause incorrect reports to show up.

This commit extends the cryptohash APIs so as callers of those routines
can fetch more context when an error happens, by using a new routine
called pg_cryptohash_error().  The error states are stored within each
implementation's internal context data, so as it is possible to extend
the logic depending on what's suited for an implementation.  The default
implementation requires few error states, but OpenSSL could report
various issues depending on its internal state so more is needed in
cryptohash_openssl.c, and the code is shaped so as we are always able to
grab the necessary information.

The core code is changed to adapt to the new error routine, painting
more "const" across the call stack where the static errors are stored,
particularly in authentication code paths on variables that provide
log details.  This way, any future changes would warn if attempting to
free these strings.  The MD5 authentication code was also a bit blurry
about the handling of "logdetail" (LOG sent to the postmaster), so
improve the comments related that, while on it.

The origin of the problem is 87ae969, that introduced the centralized
cryptohash facility.  Extra changes are done for pgcrypto in v14 for the
non-OpenSSL code path to cope with the improvements done by this
commit.

Reported-by: Michael Mühlbeyer
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com
Backpatch-through: 14
2022-01-11 09:55:16 +09:00
Jeff Davis 96a6f11c06 More cleanup of a2ab9c06ea.
Require SELECT privileges when performing UPDATE or DELETE, to be
consistent with the way a normal UPDATE or DELETE command works.

Simplify subscription test it so that it runs faster. Also, wait for
initial table sync to complete to avoid intermittent failures.

Minor doc fixup.

Discussion: https://postgr.es/m/CAA4eK1L3-qAtLO4sNGaNhzcyRi_Ufmh2YPPnUjkROBK0tN%3Dx%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/1514479.1641664638%40sss.pgh.pa.us
Discussion: https://postgr.es/m/Ydkfj5IsZg7mQR0g@paquier.xyz
2022-01-08 20:07:16 -08:00
Jeff Davis a2ab9c06ea Respect permissions within logical replication.
Prevent logical replication workers from performing insert, update,
delete, truncate, or copy commands on tables unless the subscription
owner has permission to do so.

Prevent subscription owners from circumventing row-level security by
forbidding replication into tables with row-level security policies
which the subscription owner is subject to, without regard to whether
the policy would ordinarily allow the INSERT, UPDATE, DELETE or
TRUNCATE which is being replicated.  This seems sufficient for now, as
superusers, roles with bypassrls, and target table owners should still
be able to replicate despite RLS policies.  We can revisit the
question of applying row-level security policies on a per-row basis if
this restriction proves too severe in practice.

Author: Mark Dilger
Reviewed-by: Jeff Davis, Andrew Dunstan, Ronan Dunklau
Discussion: https://postgr.es/m/9DFC88D3-1300-4DE8-ACBC-4CEF84399A53%40enterprisedb.com
2022-01-07 17:40:56 -08:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Michael Paquier 6ce16088bf Reduce relcache access in WAL sender streaming logical changes
get_rel_sync_entry(), which is called each time a change needs to be
logically replicated, is a rather hot code path in the WAL sender
sending logical changes.  This code path was doing a relcache access on
relkind and relpartition for each logical change, but we only need to
know this information when building or re-building the cached
information for a relation.

Some measurements prove that this is noticeable in perf profiles,
particularly when attempting to replicate changes from relations that
are not published as these cause less overhead in the WAL sender,
delaying further the replication of changes for relations that are
published.

Issue introduced in 83fd453.

Author: Hou Zhijie
Reviewed-by: Kyotaro Horiguchi, Euler Taveira
Discussion: https://postgr.es/m/OS0PR01MB5716E863AA9E591C1F010F7A947D9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Backpatch-through: 13
2022-01-05 10:27:07 +09:00
Michael Paquier 5d08137076 Fix some typos with {a,an}
One of the changes impacts the documentation, so backpatch.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pu6+c+r3mY24VT7u+H+E_s6vMr5OdRiZ8NT3EOa-E5Lmw@mail.gmail.com
Backpatch-through: 14
2021-12-09 15:20:36 +09:00
Amit Kapila e464cb7af3 Fix origin timestamp during decoding of ROLLBACK PREPARED operation.
This happens because we were passing incorrect arguments to
ReorderBufferFinishPrepared().

Author: Masahiko Sawada
Reviewed-by: Vignesh C
Backpatch-through: 14
Discussion: https://postgr.es/m/CAD21AoBqhUqgDZUhUVnnwKRubPDNJ6m6fJDPgok3E5cWJLL+pA@mail.gmail.com
2021-12-08 15:18:56 +05:30
Michael Paquier 7799d4e3bd Fix comment grammar in slotfuncs.c
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUkrNR2xTak+QaqxoTjPKGn8zXWripv7SR27t+Q5qF1Wg@mail.gmail.com
2021-12-01 20:28:19 +09:00
Amit Kapila 8d74fc96db Add a view to show the stats of subscription workers.
This commit adds a new system view pg_stat_subscription_workers, that
shows information about any errors which occur during the application of
logical replication changes as well as during performing initial table
synchronization. The subscription statistics entries are removed when the
corresponding subscription is removed.

It also adds an SQL function pg_stat_reset_subscription_worker() to reset
single subscription errors.

The contents of this view can be used by an upcoming patch that skips the
particular transaction that conflicts with the existing data on the
subscriber.

This view can be extended in the future to track other xact related
statistics like the number of xacts committed/aborted for subscription
workers.

Author: Masahiko Sawada
Reviewed-by: Greg Nancarrow, Hou Zhijie, Tang Haiying, Vignesh C, Dilip Kumar, Takamichi Osumi, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-11-30 08:54:30 +05:30
Amit Kapila 875e02c2df Rename SnapBuild* macros in slot.c.
Same macro names for SnapBuildOnDiskNotChecksummedSize and
SnapBuildOnDiskChecksummedSize are being used in slot.c and snapbuild.c.
This patch renames them, in slot.c, to
ReplicationSlotOnDiskNotChecksummedSize and
ReplicationSlotOnDiskChecksummedSize similar to the other macros. This
makes all macro names look consistent in slot.c.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVZo-piDGzBOJRY4ob=_goFR6t9DhZMDMjJWN7LQs34Aw@mail.gmail.com
2021-11-24 08:09:00 +05:30
Alvaro Herrera 2fed48f48f
Be more specific about OOM in XLogReaderAllocate
A couple of spots can benefit from an added errdetail(), which matches
what we were already doing in other places; and those that cannot
withstand errdetail() can get a more descriptive primary message.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/CALj2ACV+cX1eM03GfcA=ZMLXh5fSn1X1auJLz3yuS1duPSb9QA@mail.gmail.com
2021-11-22 13:43:43 -03:00
Robert Haas 1b098da200 Fix thinko in bbsink_throttle_manifest_contents.
Report and diagnosis by Dmitry Dolgov.

Discussion: http://postgr.es/m/20211115162641.dmo6l32fklh64gnw@localhost
2021-11-15 14:22:13 -05:00
Alvaro Herrera 0726c764bc
Restore lock level to set vacuum flags
Commit 27838981be mistakenly reduced the lock level from exclusive to
shared that is acquired to set PGPROC->statusFlags; this was reverted
by dcfff74fb1, but failed to do so in one spot.  Fix it.

Backpatch to 14.

Noted by Andres Freund.

Discussion: https://postgr.es/m/20211111020724.ggsfhcq3krq5r4hb@alap3.anarazel.de
2021-11-11 11:03:29 -03:00
Tom Lane c3b33698cf Doc: improve protocol spec for logical replication Type messages.
protocol.sgml documented the layout for Type messages, but completely
dropped the ball otherwise, failing to explain what they are, when
they are sent, or what they're good for.  While at it, do a little
copy-editing on the description of Relation messages.

In passing, adjust the comment for apply_handle_type() to make it
clearer that we choose not to do anything when receiving a Type
message, not that we think it has no use whatsoever.

Per question from Stefen Hillman.

Discussion: https://postgr.es/m/CAPgW8pMknK5pup6=T4a_UG=Cz80Rgp=KONqJmTdHfaZb0RvnFg@mail.gmail.com
2021-11-10 13:13:04 -05:00
Robert Haas 10eae82b27 Fix thinko in assertion in basebackup.c.
Commit 5a1007a508 tried to introduce
an assertion that the block size was at least twice the size of a
tar block, but I got the math wrong. My error was reported to me
off-list.
2021-11-10 10:12:20 -05:00
Michael Paquier c9c401a5e1 Improve error messages for some callers of XLogReadRecord()
A couple of code paths related to logical decoding (WAL sender, slot
advancing, etc.) use XLogReadRecord(), feeding on error messages
generated by walreader.c on a failure.  All those messages have no
context, making it harder to spot from where an error could come even if
these should not happen.  All the other callers of XLogReadRecord() do
that already.

Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/YYnTH6OyOwQcAdkw@paquier.xyz
2021-11-10 12:00:33 +09:00
Robert Haas 5a1007a508 Have the server properly terminate tar archives.
Earlier versions of PostgreSQL featured a version of pg_basebackup
that wanted to edit tar archives but was too dumb to parse them
properly. The server made things easier for the client by failing
to add the two blocks of zero bytes that ought to end a tar file,
leaving it up to the client to do that.

But since commit 23a1c6578c, we
don't need this hack any more, because pg_basebackup is now smarter
and can parse tar files even if they are properly terminated! So
change the server to always properly terminate the tar files. Older
versions of pg_basebackup can't talk to new servers anyway, so
there's no compatibility break.

On the pg_basebackup side, we see still need to add the terminating
zero bytes if we're talking to an older server, but not when the
server is v15+. Hopefully at some point we'll be able to remove
some of this compatibility cruft, but it seems best to hang on to
it for now.

In passing, add a file header comment to bbstreamer_tar.c, to make
it clearer what's going on here.

Discussion: http://postgr.es/m/CA+TgmoZbNzsWwM4BE5Jb_qHncY817DYZwGf+2-7hkMQ27ZwsMQ@mail.gmail.com
2021-11-09 14:29:15 -05:00
Robert Haas e997a0c642 Remove all use of ThisTimeLineID global variable outside of xlog.c
All such code deals with this global variable in one of three ways.
Sometimes the same functions use it in more than one of these ways
at the same time.

First, sometimes it's an implicit argument to one or more functions
being called in xlog.c or elsewhere, and must be set to the
appropriate value before calling those functions lest they
misbehave. In those cases, it is now passed as an explicit argument
instead.

Second, sometimes it's used to obtain the current timeline after
the end of recovery, i.e. the timeline to which WAL is being
written and flushed. Such code now calls GetWALInsertionTimeLine()
or relies on the new out parameter added to GetFlushRecPtr().

Third, sometimes it's used during recovery to store the current
replay timeline. That can change, so such code must generally
update the value before each use. It can still do that, but must
now use a local variable instead.

The net effect of these changes is to reduce by a fair amount the
amount of code that is directly accessing this global variable.
That's good, because history has shown that we don't always think
clearly about which timeline ID it's supposed to contain at any
given point in time, or indeed, whether it has been or needs to
be initialized at any given point in the code.

Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-05 12:50:01 -04:00
Robert Haas caf1f675b8 Don't set ThisTimeLineID when there's no reason to do so.
In slotfuncs.c, pg_replication_slot_advance() needs to determine
the LSN up to which the slot should be advanced, but that doesn't
require us to update ThisTimeLineID, because none of the code called
from here depends on it. If the replication slot is logical,
pg_logical_replication_slot_advance will call read_local_xlog_page,
which does use ThisTimeLineID, but also takes care of making sure
it's up to date. If the replication slot is physical, the timeline
isn't used for anything at all.

In logicalfuncs.c, pg_logical_slot_get_changes_guts() has the same
issue: the only code we're going to run that cares about timelines
is in or downstream of read_local_xlog_page, which already makes
sure that the correct value gets set. Hence, don't do it here.

Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-05 12:43:04 -04:00
Robert Haas bef47ff85d Introduce 'bbsink' abstraction to modularize base backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.

To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc'. There are plans to add more types
of 'bbsink' in future commits.

This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.

Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee,
Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger,
and Jeevan Ladhe.

Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 10:08:30 -04:00
Tom Lane f3d4019da5 Ensure consistent logical replication of datetime and float8 values.
In walreceiver, set the publisher's relevant GUCs (datestyle,
intervalstyle, extra_float_digits) to the same values that pg_dump uses,
and for the same reason: we need the output to be read the same way
regardless of the receiver's settings.  Without this, it's possible
for subscribers to misinterpret transmitted values.

Although this is clearly a bug fix, it's not without downsides:
subscribers that are storing values into some other datatype, such as
text, could get different results than before, and perhaps be unhappy
about that.  Given the lack of previous complaints, it seems best
to change this only in HEAD, and to call it out as an incompatible
change in v15.

Japin Li, per report from Sadhuprasad Patro

Discussion: https://postgr.es/m/CAFF0-CF=D7pc6st-3A9f1JnOt0qmc+BcBPVzD6fLYisKyAjkGA@mail.gmail.com
2021-11-02 14:28:50 -04:00
Alvaro Herrera 40c516bba8
Handle XLOG_OVERWRITE_CONTRECORD in DecodeXLogOp
Failing to do so results in inability of logical decoding to process the
WAL stream.  Handle it by doing nothing.

Backpatch all the way back.

Reported-by: Petr Jelínek <petr.jelinek@enterprisedb.com>
2021-11-01 13:07:23 -03:00
Robert Haas 2f5c4397c3 When fetching WAL for a basebackup, report errors with a sensible TLI.
The previous code used ThisTimeLineID, which need not even be
initialized here, although it usually was in practice, because
pg_basebackup issues IDENTIFY_SYSTEM before calling BASE_BACKUP,
and that initializes ThisTimeLineID as a side effect. That's not
really good enough, though, not only because we shoudn't be counting
on side effects like that, but also because the TLI could change
meanwhile. Fortunately, we have convenient access to more meaningful
TLI values, so use those instead.

Because of the way this logic is coded, the consequences of using
a possibly-incorrect TLI here are no worse than a slightly confusing
error message, I don't want to take any risk here, so no back-patch
at least for now.

Patch by me, reviewed by Kyotaro Horiguchi and Michael Paquier

Discussion: http://postgr.es/m/CA+TgmoZRNWGWYDX9RgTXMG6_nwSdB=PB-PPRUbvMUTGfmL2sHQ@mail.gmail.com
2021-10-29 14:00:32 -04:00
Amit Kapila 5a2832465f Allow publishing the tables of schema.
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.

The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;

A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.

Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.

Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
2021-10-27 07:44:52 +05:30
Robert Haas 902a2c2800 Remove useless code from CreateReplicationSlot.
According to the comments, we initialize sendTimeLineIsHistoric
and sendTimeLine here for the benefit of WalSndSegmentOpen.
However, the only way that can happen is if logical_read_xlog_page
calls WALRead. And since logical_read_xlog_page initializes the
same global variables internally, we don't need to also do it here.

These initializations have been here since replication slots were
introduced in commit 858ec11858. They
were certainly useless at that time, too, because logical decoding
didn't yet exist then, and physical replication doesn't examine any
WAL at the time of slot creation. I haven't checked all the
intermediate versions, but I suspect there's no point at which
this code ever did anything useful.

To reduce future confusion, remove the code. Since there's no
functional defect, no back-patch.

Discussion: http://postgr.es/m/CA+TgmobSWzacEs+r6C-7DrOPDHoDar4i9gzxB3SCBr5qjnLmVQ@mail.gmail.com
2021-10-25 10:57:12 -04:00
Michael Paquier b4ada4e19f Add replication command READ_REPLICATION_SLOT
The command is supported for physical slots for now, and returns the
type of slot, its restart_lsn and its restart_tli.

This will be useful for an upcoming patch related to pg_receivewal, to
allow the tool to be able to stream from the position of a slot, rather
than the last WAL position flushed by the backend (as reported by
IDENTIFY_SYSTEM) if the archive directory is found as empty, which would
be an advantage in the case of switching to a different archive
locations with the same slot used to avoid holes in WAL segment
archives.

Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
2021-10-25 07:40:42 +09:00
Michael Paquier 409f9ca447 Reset properly snapshot export state during transaction abort
During a replication slot creation, an ERROR generated in the same
transaction as the one creating a to-be-exported snapshot would have
left the backend in an inconsistent state, as the associated static
export snapshot state was not being reset on transaction abort, but only
on the follow-up command received by the WAL sender that created this
snapshot on replication slot creation.  This would trigger inconsistency
failures if this session tried to export again a snapshot, like during
the creation of a replication slot.

Note that a snapshot export cannot happen in a transaction block, so
there is no need to worry resetting this state for subtransaction
aborts.  Also, this inconsistent state would very unlikely show up to
users.  For example, one case where this could happen is an
out-of-memory error when building the initial snapshot to-be-exported.
Dilip found this problem while poking at a different patch, that caused
an error in this code path for reasons unrelated to HEAD.

Author: Dilip Kumar
Reviewed-by: Michael Paquier, Zhihong Yu
Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com
Backpatch-through: 9.6
2021-10-18 11:55:42 +09:00
Robert Haas 967a17fe2f Refactor basebackup.c's _tarWriteDir() function.
Sometimes, we replace a symbolic link that we find in the data
directory with an actual directory within the tarfile that we
create. _tarWriteDir was responsible both for making this
substitution and also for writing the tar header for the
resulting directory into the tar file. Make it do only the first
of those things, and rename to convert_link_to_directory.

Substantially larger refactoring of this source file is planned,
but this little bit seemed to make sense to commit
independently.

Discussion: http://postgr.es/m/CA+Tgmobz6tuv5tr-WxURe5JA1vVcGz85k4kkvoWxcyHvDpEqFA@mail.gmail.com
2021-10-12 13:11:29 -04:00
Robert Haas 0266e98c6b Flexible options for CREATE_REPLICATION_SLOT.
Like BASE_BACKUP, CREATE_REPLICATION_SLOT has historically used a
hard-coded syntax.  To improve future extensibility, adopt a flexible
options syntax here, too.

In the new syntax, instead of three mutually exclusive options
EXPORT_SNAPSHOT, USE_SNAPSHOT, and NOEXPORT_SNAPSHOT, there is now a single
SNAPSHOT option with three possible values: 'export', 'use', and 'nothing'.

This commit does not remove support for the old syntax. It just adds
the new one as an additional option, makes pg_receivewal,
pg_recvlogical, and walreceiver processes use it.

Patch by me, reviewed by Fabien Coelho, Sergei Kornilov, and
Fujii Masao.

Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 12:52:49 -04:00
Robert Haas 0ba281cb4b Flexible options for BASE_BACKUP.
Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's
hard to extend. Instead, adopt the same kind of syntax we've used for
SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's
not necessary for all of the option names to be parser keywords.

In the new syntax, most of the options now take an optional Boolean
argument. To match our practice in other in places, the options which
the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the
new syntax called WAIT and VERIFY_CHECKUMS, and the default value is
false. In the new syntax, the FAST option has been replaced by a
CHECKSUM option whose value may be 'fast' or 'spread'.

This commit does not remove support for the old syntax. It just adds
the new one as an additional option, and makes pg_basebackup prefer
the new syntax when the server is new enough to support it.

Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov,
Fujii Masao, and Tushar Ahuja.

Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
2021-10-05 11:50:21 -04:00
Amit Kapila 826584fa52 Remove obsolete comment in snapbuild.c.
Commits 955a684e04 and a975ff4980 removed the usage of running xacts
information from serialized snapshots but forgot to remove the
corresponding comment.

Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoBifOr7RS=jRe7YCavc646y9omChv6zkWXvJeZcjS9mXA@mail.gmail.com
2021-10-05 09:05:40 +05:30
Amit Kapila 5e77625b26 Add parent table name in an error in reorderbuffer.c.
This can help in troubleshooting the cause of a particular error that can
occur during decoding.

Author: Jeremy Schneider
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/808ed65b-994c-915a-361c-577f088b837f@amazon.com
2021-09-22 07:42:52 +05:30
Michael Paquier 026ed8efd6 Remove code duplication for permission checks with replication slots
Two functions, both named check_permissions(), used the same checks to
verify if a user had required privileges to work on replication slots.
This commit removes the duplication, and moves the function doing the
checks to slot.c to be centralized.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Euler Taveira
Discussion: https://postgr.es/m/CALj2ACUPpVw1u7sQocFVWrSs0n10pt_G_4NPZKSxXK6cW1dErw@mail.gmail.com
2021-09-14 10:15:49 +09:00
Amit Kapila df3640e529 Fix reorder buffer memory accounting for toast changes.
While processing toast changes in logical decoding, we rejigger the
tuple change to point to in-memory toast tuples instead to on-disk toast
tuples. And, to make sure the memory accounting is correct, we were
subtracting the old change size and then after re-computing the new tuple,
re-adding its size at the end. Now, if there is any error before we add
the new size, we will release the changes and that will update the
accounting info (subtracting the size from the counters). And we were
underflowing there which leads to an assertion failure in assert enabled
builds and wrong memory accounting in reorder buffer otherwise.

Author: Bertrand Drouvot
Reviewed-by: Amit Kapila
Backpatch-through: 13, where memory accounting was introduced
Discussion: https://postgr.es/m/92b0ee65-b8bd-e42d-c082-4f3f4bf12d34@amazon.com
2021-09-13 10:24:00 +05:30
Fujii Masao 596ba75cb1 Fix issue with WAL archiving in standby.
Previously, walreceiver always closed the currently-opened WAL segment
and created its archive notification file, after it finished writing
the current segment up and received any WAL data that should be
written into the next segment. If walreceiver exited just before
any WAL data in the next segment arrived at standby, it did not
create the archive notification file of the current segment
even though that's known completed. This behavior could cause
WAL archiving of the segment to be delayed until subsequent
restartpoints or checkpoints created its notification file.

To fix the issue, this commit changes walreceiver so that it creates
an archive notification file of a current WAL segment immediately
if that's known completed before receiving next WAL data.

Back-patch to all supported branches.

Reported-by: Kyotaro Horiguchi
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20200630.165503.1465894182551545886.horikyota.ntt@gmail.com
2021-09-09 23:56:57 +09:00
Amit Kapila 4c3478859b Log new catalog xmin candidate in LogicalIncreaseXminForSlot().
Similar to LogicalIncreaseRestartDecodingForSlot() add a debug message to
LogicalIncreaseXminForSlot() reporting a new catalog_xmin candidate.

This just adds additional diagnostic information during logical decoding that
can aid debugging.

Author: Ashutosh Bapat
Reviewed-by: Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/CAExHW5usQWbiUz0hHOCu5twS1O9DvpcPojf6sor=8q--VUuMbA@mail.gmail.com
2021-09-07 08:07:11 +05:30
Alvaro Herrera 96b665083e
Revert "Avoid creating archive status ".ready" files too early"
This reverts commit 515e3d84a0 and equivalent commits in back
branches.  This solution to the problem has a number of problems, so
we'll try again with a different approach.

Per note from Andres Freund

Discussion: https://postgr.es/m/20210831042949.52eqp5xwbxgrfank@alap3.anarazel.de
2021-09-04 12:14:30 -04:00
Amit Kapila 31c389d8de Optimize fileset usage in apply worker.
Use one fileset for the entire worker lifetime instead of using
separate filesets for each streaming transaction. Now, the
changes/subxacts files for every streaming transaction will be
created under the same fileset and the files will be deleted
after the transaction is completed.

This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet
APIs to allow users to specify whether to give an error on missing
files.

Author: Dilip Kumar, based on suggestion by Thomas Munro
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
2021-09-02 08:13:46 +05:30
Amit Kapila bad6cef32c Fix incorrect error code in StartupReplicationOrigin().
ERRCODE_CONFIGURATION_LIMIT_EXCEEDED was used for checksum failure, use
ERRCODE_DATA_CORRUPTED instead.

Reported-by: Tatsuhito Kasahara
Author: Tatsuhito Kasahara
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAP0=ZVLHtYffs8SOWcFJWrBGoRzT9QQbk+_aP+E5AHLNXiOorA@mail.gmail.com
2021-08-30 09:14:31 +05:30
Amit Kapila dcac5e7ac1 Refactor sharedfileset.c to separate out fileset implementation.
Move fileset related implementation out of sharedfileset.c to allow its
usage by backends that don't want to share filesets among different
processes. After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that survive
across transactions.

Author: Dilip Kumar, based on suggestion by Andres Freund
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
2021-08-30 08:48:15 +05:30
Amit Kapila abc0910e2e Add logical change details to logical replication worker errcontext.
Previously, on the subscriber, we set the error context callback for the
tuple data conversion failures. This commit replaces the existing error
context callback with a comprehensive one so that it shows not only the
details of data conversion failures but also the details of logical change
being applied by the apply worker or table sync worker. The additional
information displayed will be the command, transaction id, and timestamp.

The error context is added to an error only when applying a change but not
while doing other work like receiving data etc.

This will help users in diagnosing the problems that occur during logical
replication. It also can be used for future work that allows skipping a
particular transaction on the subscriber.

Author: Masahiko Sawada
Reviewed-by: Hou Zhijie, Greg Nancarrow, Haiying Tang, Amit Kapila
Tested-by: Haiying Tang
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-08-27 08:30:23 +05:30
Alvaro Herrera 515e3d84a0
Avoid creating archive status ".ready" files too early
WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files.  Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it.  If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.

To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.

This has always been wrong, so backpatch all the way back.

Author: Nathan Bossart <bossartn@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
2021-08-23 15:50:35 -04:00
Michael Paquier a3fcbcda75 Fix backup manifests to generate correct WAL-Ranges across timelines
In a backup manifest, WAL-Ranges stores the range of WAL that is
required for the backup to be valid.  pg_verifybackup would then
internally use pg_waldump for the checks based on this data.

When the timeline where the backup started was more than 1 with a
history file looked at for the manifest data generation, the calculation
of the WAL range for the first timeline to check was incorrect.  The
previous logic used as start LSN the start position of the first
timeline, but it needs to use the start LSN of the backup.  This would
cause failures with pg_verifybackup, or any tools making use of the
backup manifests.

This commit adds a test based on a logic using a self-promoted node,
making it rather cheap.

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20210818.143031.1867083699202617521.horikyota.ntt@gmail.com
Backpatch-through: 13
2021-08-23 11:09:33 +09:00
Amit Kapila 4cd7a18968 Rename LOGICAL_REP_MSG_STREAM_END to LOGICAL_REP_MSG_STREAM_STOP.
In the code, most places used the term "Stream Stop" for the logical
stream message. This commit improves consistency by renaming LogicalRepMsgType
"LOGICAL_REP_MSG_STREAM_END" to "LOGICAL_REP_MSG_STREAM_STOP".

Author: Masahiko Sawada
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-08-19 09:34:26 +05:30
Michael Paquier 2576dcfb76 Revert refactoring of hex code to src/common/
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/.  This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding.  As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are.  Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.

The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13.  Fow now and
post-beta3, this is the simplest option.

Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
2021-08-19 09:20:13 +09:00
Fujii Masao 93d573d865 Remove unused argument "txn" in maybe_send_schema().
Commit 464824323e added the argument "txn" into maybe_send_schema()
though it was not used.

Author: Hou Zhijie
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/OS0PR01MB5716146E43928FB92D45D29794EC9@OS0PR01MB5716.jpnprd01.prod.outlook.com
2021-08-05 17:49:51 +09:00
Amit Kapila 63cf61cdeb Add prepare API support for streaming transactions in logical replication.
Commit a8fd13cab0 added support for prepared transactions to built-in
logical replication via a new option "two_phase" for a subscription. The
"two_phase" option was not allowed with the existing streaming option.

This commit permits the combination of "streaming" and "two_phase"
subscription options. It extends the pgoutput plugin and the subscriber
side code to add the prepare API for streaming transactions which will
apply the changes accumulated in the spool-file at prepare time.

Author: Peter Smith and Ajin Cherian
Reviewed-by: Vignesh C, Amit Kapila, Greg Nancarrow
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
Discussion: https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2021-08-04 07:47:06 +05:30
Jeff Davis 14d474e079 Improve documentation for START_REPLICATION ... LOGICAL.
The starting point may not be exactly what the client requested; it
may be at the slot's confirmed_flush_lsn.

Also, upgrade the message from DEBUG1 to LOG when this happens.

Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/c5c861d576f2511732f8002c76245da587110b1c.camel%40j-davis.com
2021-07-30 15:12:20 -07:00
Amit Kapila 16bd4becee Remove unused argument in apply_handle_commit_internal().
Oversight in commit 0926e96c49.

Author: Masahiko Sawada
Reviewed-By: Amit Kapila
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-07-30 08:17:38 +05:30
Amit Kapila 91f9861242 Refactor to make common functions in proto.c and worker.c.
This is a non-functional change only to refactor code to extract some
replication logic into static functions.

This is done as preparation for the 2PC streaming patch which also shares
this common logic.

Author: Peter Smith
Reviewed-By: Amit Kapila
Discussion: https://postgr.es/m/CAHut+PuiSA8AiLcE2N5StzSKs46SQEP_vDOUD5fX2XCVtfZ7mQ@mail.gmail.com
2021-07-29 15:51:45 +05:30
Michael Paquier 7b7fbe1e8b Clarify some comments making use of leetspeak term "up2date"
Most of these are new, as of a8fd13c, and "up-to-date" is much easier to
parse for the average reader.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PtHbHvgOjs_R9LyDF21j-Wn8SxoTtWMQNP2ifXN6t2cSg@mail.gmail.com
2021-07-28 10:31:24 +09:00
Amit Kapila 01c3adcdd8 Fix potential buffer overruns in proto.c.
Prevent potential buffer overruns when using strcpy to gid buffer. This
has been introduced by commit a8fd13cab0.

Reported-by: Tom Lane as per coverity
Author: Peter Smith
Reviewed-by: Amit Kapila
Discussion: https://www.postgresql.org/message-id/161029.1626639923%40sss.pgh.pa.us
2021-07-20 08:15:01 +05:30
Alvaro Herrera ead9e51e82
Advance old-segment horizon properly after slot invalidation
When some slots are invalidated due to the max_slot_wal_keep_size limit,
the old segment horizon should move forward to stay within the limit.
However, in commit c655077639 we forgot to call KeepLogSeg again to
recompute the horizon after invalidating replication slots.  In cases
where other slots remained, the limits would be recomputed eventually
for other reasons, but if all slots were invalidated, the limits would
not move at all afterwards.  Repair.

Backpatch to 13 where the feature was introduced.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Marcin Krupowicz <mk@071.ovh>
Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
2021-07-16 12:07:30 -04:00
Amit Kapila a8fd13cab0 Add support for prepared transactions to built-in logical replication.
To add support for streaming transactions at prepare time into the
built-in logical replication, we need to do the following things:

* Modify the output plugin (pgoutput) to implement the new two-phase API
callbacks, by leveraging the extended replication protocol.

* Modify the replication apply worker, to properly handle two-phase
transactions by replaying them on prepare.

* Add a new SUBSCRIPTION option "two_phase" to allow users to enable
two-phase transactions. We enable the two_phase once the initial data sync
is over.

We however must explicitly disable replication of two-phase transactions
during replication slot creation, even if the plugin supports it. We
don't need to replicate the changes accumulated during this phase,
and moreover, we don't have a replication connection open so we don't know
where to send the data anyway.

The streaming option is not allowed with this new two_phase option. This
can be done as a separate patch.

We don't allow to toggle two_phase option of a subscription because it can
lead to an inconsistent replica. For the same reason, we don't allow to
refresh the publication once the two_phase is enabled for a subscription
unless copy_data option is false.

Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-14 07:33:50 +05:30
Jeff Davis 8e7811e952 Eliminate replication protocol error related to IDENTIFY_SYSTEM.
The requirement that IDENTIFY_SYSTEM be run before START_REPLICATION
was both undocumented and unnecessary. Remove the error and ensure
that ThisTimeLineID is initialized in START_REPLICATION.

Elect not to backport because this requirement was expected behavior
(even if inconsistently enforced), and is not likely to cause any
major problem.

Author: Jeff Davis
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/de4bbf05b7cd94227841c433ea6ff71d2130c713.camel%40j-davis.com
2021-07-09 11:37:45 -07:00
Daniel Gustafsson 387925893e Fix typos in pgstat.c, reorderbuffer.c and pathnodes.h
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/50250765-5B87-4AD7-9770-7FCED42A6175@yesql.se
2021-07-08 12:45:09 +02:00
Tom Lane 50371df266 Don't try to print data type names in slot_store_error_callback().
The existing code tried to do syscache lookups in an already-failed
transaction, which is problematic to say the least.  After some
consideration of alternatives, the best fix seems to be to just drop
type names from the error message altogether.  The table and column
names seem like sufficient localization.  If the user is unsure what
types are involved, she can check the local and remote table
definitions.

Having done that, we can also discard the LogicalRepTypMap hash
table, which had no other use.  Arguably, LOGICAL_REP_MSG_TYPE
replication messages are now obsolete as well; but we should
probably keep them in case some other use emerges.  (The complexity
of removing something from the replication protocol would likely
outweigh any savings anyhow.)

Masahiko Sawada and Bharath Rupireddy, per complaint from Andres
Freund.  Back-patch to v10 where this code originated.

Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
2021-07-02 16:04:54 -04:00
Amit Kapila 52d26d560e Allow streaming the changes after speculative aborts.
Until now, we didn't allow to stream the changes in logical replication
till we receive speculative confirm or the next DML change record after
speculative inserts. The reason was that we never use to process
speculative aborts but after commit 4daa140a2f it is possible to process
them so we can allow streaming once we receive speculative abort after
speculative insertion.

We decided to backpatch to 14 where the feature for streaming in progress
transactions have been introduced as this is a minor change and makes that
functionality better.

Author: Amit Kapila
Reviewed-By: Dilip Kumar
Backpatch-through: 14
Discussion: https://postgr.es/m/CAA4eK1KdqmTCtrBR6oFfGELrLLbDLDedL6zACcsUOQuTJBj1vw@mail.gmail.com
2021-06-30 09:37:59 +05:30
Amit Kapila cda03cfed6 Allow enabling two-phase option via replication protocol.
Extend the replication command CREATE_REPLICATION_SLOT to support the
TWO_PHASE option. This will allow decoding commands like PREPARE
TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED for slots created with
this option. The decoding of the transaction happens at prepare command.

This patch also adds support of two-phase in pg_recvlogical via a new
option --two-phase.

This option will also be used by future patches that allow streaming of
transactions at prepare time for built-in logical replication. With this,
the out-of-core logical replication solutions can enable replication of
two-phase transactions via replication protocol.

Author: Ajin Cherian
Reviewed-By: Jeff Davis, Vignesh C, Amit Kapila
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
https://postgr.es/m/64b9f783c6e125f18f88fbc0c0234e34e71d8639.camel@j-davis.com
2021-06-30 08:45:47 +05:30
Noah Misch 2b3e4672f7 Don't ERROR on PreallocXlogFiles() race condition.
Before a restartpoint finishes PreallocXlogFiles(), a startup process
KeepFileRestoredFromArchive() call can unlink the preallocated segment.
If a CHECKPOINT sql command had elicited the restartpoint experiencing
the race condition, that sql command failed.  Moreover, the restartpoint
omitted its log_checkpoints message and some inessential resource
reclamation.  Prevent the ERROR by skipping open() of the segment.
Since these consequences are so minor, no back-patch.

Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28 18:34:56 -07:00
Noah Misch 421484f79c Remove XLogFileInit() ability to unlink a pre-existing file.
Only initdb used it.  initdb refuses to operate on a non-empty directory
and generally does not cope with pre-existing files of other kinds.
Hence, use the opportunity to simplify.

Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28 18:34:56 -07:00
Noah Misch c53c6b98d3 Remove XLogFileInit() ability to skip ControlFileLock.
Cold paths, initdb and end-of-recovery, used it.  Don't optimize them.

Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
2021-06-28 18:34:55 -07:00
Andrew Dunstan e1c1c30f63
Pre branch pgindent / pgperltidy run
Along the way make a slight adjustment to
src/include/utils/queryjumble.h to avoid an unused typedef.
2021-06-28 11:05:54 -04:00
Peter Eisentraut c302a61390 Error message refactoring
Take some untranslatable things out of the message and replace by
format placeholders, to reduce translatable strings and reduce
translation mistakes.
2021-06-27 09:41:16 +02:00
Tom Lane 6b787d9e32 Improve SQLSTATE reporting in some replication-related code.
I started out with the goal of reporting ERRCODE_CONNECTION_FAILURE
when walrcv_connect() fails, but as I looked around I realized that
whoever wrote this code was of the opinion that errcodes are purely
optional.  That's not my understanding of our project policy.  Hence,
make sure that an errcode is provided in each ereport that (a) is
ERROR or higher level and (b) isn't arguably an internal logic error.
Also fix some very dubious existing errcode assignments.

While this is not per policy, it's also largely cosmetic, since few
of these cases could get reported to applications.  So I don't
feel a need to back-patch.

Discussion: https://postgr.es/m/2189704.1623512522@sss.pgh.pa.us
2021-06-16 11:52:05 -04:00
Amit Kapila 4daa140a2f Fix decoding of speculative aborts.
During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
2021-06-15 08:28:36 +05:30
Alvaro Herrera 33c5099567
Fix logic bug in 1632ea4368
I overlooked that one condition was logically inverted.  The fix is a
little bit more involved than simply negating the condition, to make
the code easier to read.

Fix some outdated comments left by the same commit, while at it.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/YMRlmB3/lZw8YBH+@paquier.xyz
2021-06-14 16:31:12 -04:00
Tom Lane fe6a20ce54 Don't use Asserts to check for violations of replication protocol.
Using an Assert to check the validity of incoming messages is an
extremely poor decision.  In a debug build, it should not be that easy
for a broken or malicious remote client to crash the logrep worker.
The consequences could be even worse in non-debug builds, which will
fail to make such checks at all, leading to who-knows-what misbehavior.
Hence, promote every Assert that could possibly be triggered by wrong
or out-of-order replication messages to a full test-and-ereport.

To avoid bloating the set of messages the translation team has to cope
with, establish a policy that replication protocol violation error
reports don't need to be translated.  Hence, all the new messages here
use errmsg_internal().  A couple of old messages are changed likewise
for consistency.

Along the way, fix some non-idiomatic or outright wrong uses of
hash_search().

Most of these mistakes are new with the "streaming replication"
patch (commit 464824323), but a couple go back a long way.
Back-patch as appropriate.

Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us
2021-06-12 12:59:15 -04:00
Tom Lane ab55d742eb Fix multiple crasher bugs in partitioned-table replication logic.
apply_handle_tuple_routing(), having detected and reported that
the tuple it needed to update didn't exist, tried to update that
tuple anyway, leading to a null-pointer dereference.

logicalrep_partition_open() failed to ensure that the
LogicalRepPartMapEntry it built for a partition was fully
independent of that for the partition root, leading to
trouble if the root entry was later freed or rebuilt.

Meanwhile, on the publisher's side, pgoutput_change() sometimes
attempted to apply execute_attr_map_tuple() to a NULL tuple.

The first of these was reported by Sergey Bernikov in bug #17055;
I found the other two while developing some test cases for this
sadly under-tested code.

Diagnosis and patch for the first issue by Amit Langote; patches
for the others by me; new test cases by me.  Back-patch to v13
where this logic came in.

Discussion: https://postgr.es/m/17055-9ba800ec8522668b@postgresql.org
2021-06-11 16:12:41 -04:00
Alvaro Herrera 1632ea4368
Return ReplicationSlotAcquire API to its original form
Per 96540f80f833; the awkward API introduced by c655077639 is no
longer needed.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20210408020913.zzprrlvqyvlt5cyy@alap3.anarazel.de
2021-06-11 15:48:26 -04:00
Alvaro Herrera 96540f80f8
Fix race condition in invalidating obsolete replication slots
The code added to mark replication slots invalid in commit c655077639
had the race condition that a slot can be dropped or advanced
concurrently with checkpointer trying to invalidate it.  Rewrite the
code to close those races.

The changes to ReplicationSlotAcquire's API added with c655077639 are
not necessary anymore.  To avoid an ABI break in released branches, this
commit leaves that unchanged; it'll be changed in a master-only commit
separately.

Backpatch to 13, where this code first appeared.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Andres Freund <andres@anarazel.de>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20210408001037.wfmk6jud36auhfqm@alap3.anarazel.de
2021-06-11 12:16:14 -04:00
Tom Lane 3a09d75b4f Rearrange logrep worker's snapshot handling some more.
It turns out that worker.c's code path for TRUNCATE was also
careless about establishing a snapshot while executing user-defined
code, allowing the checks added by commit 84f5c2908 to fail when
a trigger is fired in that context.

We could just wrap Push/PopActiveSnapshot around the truncate call,
but it seems better to establish a policy of holding a snapshot
throughout execution of a replication step.  To help with that and
possible future requirements, replace the previous ensure_transaction
calls with pairs of begin/end_replication_step calls.

Per report from Mark Dilger.  Back-patch to v11, like the previous
changes.

Discussion: https://postgr.es/m/B4A3AF82-79ED-4F4C-A4E5-CD2622098972@enterprisedb.com
2021-06-10 12:27:27 -04:00
Peter Eisentraut b29fa951ec Add some const decorations
One of these functions is new in PostgreSQL 14; might as well start it
out right.
2021-06-10 16:21:48 +02:00
Amit Kapila be57f21650 Remove two_phase variable from CreateReplicationSlotCmd struct.
Commit 19890a064e added the option to enable two_phase commits via
pg_create_logical_replication_slot but didn't extend the support of same
in replication protocol. However, by mistake, it added the two_phase
variable in CreateReplicationSlotCmd which is required only when we extend
the replication protocol.

Reported-by: Jeff Davis
Author: Ajin Cherian
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/64b9f783c6e125f18f88fbc0c0234e34e71d8639.camel@j-davis.com
2021-06-07 09:32:06 +05:30
Amit Kapila eb89cb43a0 pgoutput: Fix memory leak due to RelationSyncEntry.map.
Release memory allocated when creating the tuple-conversion map and its
component TupleDescs when its owning sync entry is invalidated.
TupleDescs must also be freed when no map is deemed necessary, to begin
with.

Reported-by: Andres Freund
Author: Amit Langote
Reviewed-by: Takamichi Osumi, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/MEYP282MB166933B1AB02B4FE56E82453B64D9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2021-06-01 14:27:14 +05:30
Amit Kapila 6f4bdf8152 Fix assertion during streaming of multi-insert toast changes.
While decoding the multi-insert WAL we can't clean the toast untill we get
the last insert of that WAL record. Now if we stream the changes before we
get the last change, the memory for toast chunks won't be released and we
expect the txn to have streamed all changes after streaming.  This
restriction is mainly to ensure the correctness of streamed transactions
and it doesn't seem worth uplifting such a restriction just to allow this
case because anyway we will stream the transaction once such an insert is
complete.

Previously we were using two different flags (one for toast tuples and
another for speculative inserts) to indicate partial changes. Now instead
we replaced both of them with a single flag to indicate partial changes.

Reported-by: Pavan Deolasee
Author: Dilip Kumar
Reviewed-by: Pavan Deolasee, Amit Kapila
Discussion: https://postgr.es/m/CABOikdN-_858zojYN-2tNcHiVTw-nhxPwoQS4quExeweQfG1Ug@mail.gmail.com
2021-05-27 07:59:43 +05:30
Tom Lane b39630fd41 Fix access to no-longer-open relcache entry in logical-rep worker.
If we redirected a replicated tuple operation into a partition child
table, and then tried to fire AFTER triggers for that event, the
relation cache entry for the child table was already closed.  This has
no visible ill effects as long as the entry is still there and still
valid, but an unluckily-timed cache flush could result in a crash or
other misbehavior.

To fix, postpone the ExecCleanupTupleRouting call (which is what
closes the child table) until after we've fired triggers.  This
requires a bit of refactoring so that the cleanup function can
have access to the necessary state.

In HEAD, I took the opportunity to simplify some of worker.c's
function APIs based on use of the new ApplyExecutionData struct.
However, it doesn't seem safe/practical to back-patch that aspect,
at least not without a lot of analysis of possible interactions
with a04daa97a.

In passing, add an Assert to afterTriggerInvokeEvents to catch
such cases.  This seems worthwhile because we've grown a number
of fairly unstructured ways of calling AfterTriggerEndQuery.

Back-patch to v13, where worker.c grew the ability to deal with
partitioned target tables.

Discussion: https://postgr.es/m/3382681.1621381328@sss.pgh.pa.us
2021-05-22 21:25:29 -04:00
Tom Lane 84f5c2908d Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal.  In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve.  So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands.  Hence, teach
spi.c about doing this at the right time.  (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix.  But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-05-21 14:03:59 -04:00
Amit Kapila 6d0eb38557 Fix deadlock for multiple replicating truncates of the same table.
While applying the truncate change, the logical apply worker acquires
RowExclusiveLock on the relation being truncated. This allowed truncate on
the relation at a time by two apply workers which lead to a deadlock. The
reason was that one of the workers after updating the pg_class tuple tries
to acquire SHARE lock on the relation and started to wait for the second
worker which has acquired RowExclusiveLock on the relation. And when the
second worker tries to update the pg_class tuple, it starts to wait for
the first worker which leads to a deadlock. Fix it by acquiring
AccessExclusiveLock on the relation before applying the truncate change as
we do for normal truncate operation.

Author: Peter Smith, test case by Haiying Tang
Reviewed-by: Dilip Kumar, Amit Kapila
Backpatch-through: 11
Discussion: https://postgr.es/m/CAHut+PsNm43p0jM+idTvWwiGZPcP0hGrHMPK9TOAkc+a4UpUqw@mail.gmail.com
2021-05-21 07:54:27 +05:30
Alvaro Herrera db16c65647
Rename the logical replication global "wrconn"
The worker.c global wrconn is only meant to be used by logical apply/
tablesync workers, but there are other variables with the same name. To
reduce future confusion rename the global from "wrconn" to
"LogRepWorkerWalRcvConn".

While this is just cosmetic, it seems better to backpatch it all the way
back to 10 where this code appeared, to avoid future backpatching
issues.

Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com
2021-05-12 19:13:54 -04:00
Tom Lane def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Thomas Munro c2dc19342e Revert recovery prefetching feature.
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.

Commits reverted:

dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery."
1d257577: Optionally prefetch referenced data in recovery.
f003d9f8: Add circular WAL decoding buffer.
323cbe7c: Remove read_page callback from XLogReader.

Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the
corresponding section of config.sgml is now reverted.

Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
2021-05-10 16:06:09 +12:00
Amit Kapila 592f00f8de Update replication statistics after every stream/spill.
Currently, replication slot statistics are updated at prepare, commit, and
rollback. Now, if the transaction is interrupted the stats might not get
updated. Fixed this by updating replication statistics after every
stream/spill.

In passing update the docs to change the description of some of the slot
stats.

Author: Vignesh C, Sawada Masahiko
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-05-06 11:21:26 +05:30
Amit Kapila 2ce353fc19 Tighten the concurrent abort check during decoding.
During decoding of an in-progress or prepared transaction, we detect
concurrent abort with an error code ERRCODE_TRANSACTION_ROLLBACK. That is
not sufficient because a callback can decide to throw that error code
at other times as well.

Reported-by: Tom Lane
Author: Amit Kapila
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/CAA4eK1KCjPRS4aZHB48QMM4J8XOC1+TD8jo-4Yu84E+MjwqVhA@mail.gmail.com
2021-05-06 08:26:42 +05:30
Amit Kapila 205f466282 Fix the computation of slot stats for 'total_bytes'.
Previously, we were using the size of all the changes present in
ReorderBuffer to compute total_bytes after decoding a transaction and that
can lead to counting some of the transactions' changes more than once. Fix
it by using the size of the changes decoded for a transaction to compute
'total_bytes'.

Author: Sawada Masahiko
Reviewed-by: Vignesh C, Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-05-03 07:22:08 +05:30
Amit Kapila ee4ba01dbb Fix the bugs in selecting the transaction for streaming.
There were two problems:
a. We were always selecting the next available txn instead of selecting it
when it is larger than the previous transaction.
b. We were selecting the transactions which haven't made any changes to
the database (base snapshot is not set). Later it was hitting an Assert
because we don't decode such transactions and the changes in txn remain as
it is. It is better not to choose such transactions for streaming in the
first place.

Reported-by: Haiying Tang
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB61133B94E63177040F7ECDA1FB429@OS0PR01MB6113.jpnprd01.prod.outlook.com
2021-04-30 10:49:52 +05:30
Tom Lane 9626325da5 Add heuristic incoming-message-size limits in the server.
We had a report of confusing server behavior caused by a client bug
that sent junk to the server: the server thought the junk was a
very long message length and waited patiently for data that would
never come.  We can reduce the risk of that by being less trusting
about message lengths.

For a long time, libpq has had a heuristic rule that it wouldn't
believe large message size words, except for a small number of
message types that are expected to be (potentially) long.  This
provides some defense against loss of message-boundary sync and
other corrupted-data cases.  The server does something similar,
except that up to now it only limited the lengths of messages
received during the connection authentication phase.  Let's
do the same as in libpq and put restrictions on the allowed
length of all messages, while distinguishing between message
types that are expected to be long and those that aren't.

I used a limit of 10000 bytes for non-long messages.  (libpq's
corresponding limit is 30000 bytes, but given the asymmetry of
the FE/BE protocol, there's no good reason why the numbers should
be the same.)  Experimentation suggests that this is at least a
factor of 10, maybe a factor of 100, more than we really need;
but plenty of daylight seems desirable to avoid false positives.
In any case we can adjust the limit based on beta-test results.

For long messages, set a limit of MaxAllocSize - 1, which is the
most that we can absorb into the StringInfo buffer that the message
is collected in.  This just serves to make sure that a bogus message
size is reported as such, rather than as a confusing gripe about
not being able to enlarge a string buffer.

While at it, make sure that non-mainline code paths (such as
COPY FROM STDIN) are as paranoid as SocketBackend is, and validate
the message type code before believing the message length.
This provides an additional guard against getting stuck on corrupted
input.

Discussion: https://postgr.es/m/2003757.1619373089@sss.pgh.pa.us
2021-04-28 15:50:46 -04:00
Fujii Masao 8e9ea08bae Don't pass "ONLY" options specified in TRUNCATE to foreign data wrapper.
Commit 8ff1c94649 allowed TRUNCATE command to truncate foreign tables.
Previously the information about "ONLY" options specified in TRUNCATE
command were passed to the foreign data wrapper. Then postgres_fdw
constructed the TRUNCATE command to issue the remote server and
included "ONLY" options in it based on the passed information.

On the other hand, "ONLY" options specified in SELECT, UPDATE or DELETE
have no effect when accessing or modifying the remote table, i.e.,
are not passed to the foreign data wrapper. So it's inconsistent to
make only TRUNCATE command pass the "ONLY" options to the foreign data
wrapper. Therefore this commit changes the TRUNCATE command so that
it doesn't pass the "ONLY" options to the foreign data wrapper,
for the consistency with other statements. Also this commit changes
postgres_fdw so that it always doesn't include "ONLY" options in
the TRUNCATE command that it constructs.

Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi, Justin Pryzby, Zhihong Yu
Discussion: https://postgr.es/m/551ed8c1-f531-818b-664a-2cecdab99cd8@oss.nttdata.com
2021-04-27 14:41:27 +09:00
Amit Kapila 3fa17d3771 Use HTAB for replication slot statistics.
Previously, we used to use the array of size max_replication_slots to
store stats for replication slots. But that had two problems in the cases
where a message for dropping a slot gets lost: 1) the stats for the new
slot are not recorded if the array is full and 2) writing beyond the end
of the array if the user reduces the max_replication_slots.

This commit uses HTAB for replication slot statistics, resolving both
problems. Now, pgstat_vacuum_stat() search for all the dead replication
slots in stats hashtable and tell the collector to remove them. To avoid
showing the stats for the already-dropped slots, pg_stat_replication_slots
view searches slot stats by the slot name taken from pg_replication_slots.

Also, we send a message for creating a slot at slot creation, initializing
the stats. This reduces the possibility that the stats are accumulated
into the old slot stats when a message for dropping a slot gets lost.

Reported-by: Andres Freund
Author: Sawada Masahiko, test case by Vignesh C
Reviewed-by: Amit Kapila, Vignesh C, Dilip Kumar
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-04-27 09:09:11 +05:30
Amit Kapila e7eea52b2d Fix Logical Replication of Truncate in synchronous commit mode.
The Truncate operation acquires an exclusive lock on the target relation
and indexes. It then waits for logical replication of the operation to
finish at commit. Now because we are acquiring the shared lock on the
target index to get index attributes in pgoutput while sending the
changes for the Truncate operation, it leads to a deadlock.

Actually, we don't need to acquire a lock on the target index as we build
the cache entry using a historic snapshot and all the later changes are
absorbed while decoding WAL. So, we wrote a special purpose function for
logical replication to get a bitmap of replica identity attribute numbers
where we get that information without locking the target index.

We decided not to backpatch this as there doesn't seem to be any field
complaint about this issue since it was introduced in commit 5dfd1e5a in
v11.

Reported-by: Haiying Tang
Author: Takamichi Osumi, test case by Li Japin
Reviewed-by: Amit Kapila, Ajin Cherian
Discussion: https://postgr.es/m/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com
2021-04-27 08:28:26 +05:30
Amit Kapila f25a4584c6 Avoid sending prepare multiple times while decoding.
We send the prepare for the concurrently aborted xacts so that later when
rollback prepared is decoded and sent, the downstream should be able to
rollback such a xact. For 'streaming' case (when we send changes for
in-progress transactions), we were sending prepare twice when concurrent
abort was detected.

Author: Peter Smith
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/f82133c6-6055-b400-7922-97dae9f2b50b@enterprisedb.com
2021-04-26 11:27:44 +05:30
Amit Kapila 6d2e87a077 Fix typo in reorderbuffer.c.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PtvzuYY0zu=dVRK_WVz5WGos1+otZWgEWqjha1ncoSRag@mail.gmail.com
2021-04-26 08:42:46 +05:30
Michael Paquier f3b141c482 Fix relation leak for subscribers firing triggers in logical replication
Creating a trigger on a relation to which an apply operation is
triggered would cause a relation leak once the change gets committed,
as the executor would miss that the relation needs to be closed
beforehand.  This issue got introduced with the refactoring done in
1375422c, where it becomes necessary to track relations within
es_opened_result_relations to make sure that they are closed.

We have discussed using ExecInitResultRelation() coupled with
ExecCloseResultRelations() for the relations in need of tracking by the
apply operations in the subscribers, which would simplify greatly the
opening and closing of indexes, but this requires a larger rework and
reorganization of the worker code, particularly for the tuple routing
part.  And that's not really welcome post feature freeze.  So, for now,
settle down to the same solution as TRUNCATE which is to fill in
es_opened_result_relations with the relation opened, to make sure that
ExecGetTriggerResultRel() finds them and that they get closed.

The code is lightly refactored so as a relation is not registered three
times for each DML code path, making the whole a bit easier to follow.

Reported-by: Tang Haiying, Shi Yu, Hou Zhijie
Author: Amit Langote, Masahiko Sawada, Hou Zhijie
Reviewed-by: Amit Kapila, Michael Paquier
Discussion: https://postgr.es/m/OS0PR01MB611383FA0FE92EB9DE21946AFB769@OS0PR01MB6113.jpnprd01.prod.outlook.com
2021-04-22 12:48:54 +09:00
Peter Eisentraut 640b91c3ed Use correct format placeholder for pids
Should be signed, not unsigned.
2021-04-19 10:43:18 +02:00
Peter Eisentraut f59b58e2a1 Use correct format placeholder for block numbers
Should be %u rather than %d.
2021-04-17 09:40:50 +02:00
Amit Kapila f5fc2f5b23 Add information of total data processed to replication slot stats.
This adds the statistics about total transactions count and total
transaction data logically sent to the decoding output plugin from
ReorderBuffer. Users can query the pg_stat_replication_slots view to check
these stats.

Suggested-by: Andres Freund
Author: Vignesh C and Amit Kapila
Reviewed-by: Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-04-16 07:34:43 +05:30
Amit Kapila cca57c1d9b Use NameData datatype for slotname in stats.
This will make it consistent with the other usage of slotname in the code.
In the passing, change pgstat_report_replslot signature to use a structure
rather than multiple parameters.

Reported-by: Andres Freund
Author: Vignesh C
Reviewed-by: Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-04-14 08:55:03 +05:30
Fujii Masao 8ff1c94649 Allow TRUNCATE command to truncate foreign tables.
This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.

The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.

For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.

Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
2021-04-08 20:56:08 +09:00
Thomas Munro f003d9f872 Add circular WAL decoding buffer.
Teach xlogreader.c to decode its output into a circular buffer, to
support optimizations based on looking ahead.

 * XLogReadRecord() works as before, consuming records one by one, and
   allowing them to be examined via the traditional XLogRecGetXXX()
   macros.

 * An alternative new interface XLogNextRecord() is added that returns
   pointers to DecodedXLogRecord structs that can be examined directly.

 * XLogReadAhead() provides a second cursor that lets you see
   further ahead, as long as data is available and there is enough space
   in the decoding buffer.  This returns DecodedXLogRecord pointers to the
   caller, but also adds them to a queue of records that will later be
   consumed by XLogNextRecord()/XLogReadRecord().

The buffer's size is controlled with wal_decode_buffer_size.  The buffer
could potentially be placed into shared memory, for future projects.
Large records that don't fit in the circular buffer are called
"oversized" and allocated separately with palloc().

Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
2021-04-08 23:20:42 +12:00
Thomas Munro 323cbe7c7d Remove read_page callback from XLogReader.
Previously, the XLogReader module would fetch new input data using a
callback function.  Redesign the interface so that it tells the caller
to insert more data with a special return value instead.  This API suits
later patches for prefetching, encryption and maybe other future
projects that would otherwise require continually extending the callback
interface.

As incidental cleanup work, move global variables readOff, readLen and
readSegNo inside XlogReaderState.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version)
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
2021-04-08 23:20:42 +12:00
Peter Eisentraut 0b5e824528 Message improvement
The previous wording contained a superfluous comma.  Adjust phrasing
for grammatical correctness and clarity.
2021-04-07 07:42:44 +02:00
Tom Lane c5b7ba4e67 Postpone some stuff out of ExecInitModifyTable.
Arrange to do some things on-demand, rather than immediately during
executor startup, because there's a fair chance of never having to do
them at all:

* Don't open result relations' indexes until needed.

* Don't initialize partition tuple routing, nor the child-to-root
tuple conversion map, until needed.

This wins in UPDATEs on partitioned tables when only some of the
partitions will actually receive updates; with larger partition
counts the savings is quite noticeable.  Also, we can remove some
sketchy heuristics in ExecInitModifyTable about whether to set up
tuple routing.

Also, remove execPartition.c's private hash table tracking which
partitions were already opened by the ModifyTable node.  Instead
use the hash added to ModifyTable itself by commit 86dc90056.

To allow lazy computation of the conversion maps, we now set
ri_RootResultRelInfo in all child ResultRelInfos.  We formerly set it
only in some, not terribly well-defined, cases.  This has user-visible
side effects in that now more error messages refer to the root
relation instead of some partition (and provide error data in the
root's column order, too).  It looks to me like this is a strict
improvement in consistency, so I don't have a problem with the
output changes visible in this commit.

Extracted from a larger patch, which seemed to me to be too messy
to push in one commit.

Amit Langote, reviewed at different times by Heikki Linnakangas and
myself

Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
2021-04-06 15:57:11 -04:00
Peter Geoghegan 8523492d4e Remove tupgone special case from vacuumlazy.c.
Retry the call to heap_prune_page() in rare cases where there is
disagreement between the heap_prune_page() call and the call to
HeapTupleSatisfiesVacuum() that immediately follows.  Disagreement is
possible when a concurrently-aborted transaction makes a tuple DEAD
during the tiny window between each step.  This was the only case where
a tuple considered DEAD by VACUUM still had storage following pruning.
VACUUM's definition of dead tuples is now uniformly simple and
unambiguous: dead tuples from each page are always LP_DEAD line pointers
that were encountered just after we performed pruning (and just before
we considered freezing remaining items with tuple storage).

Eliminating the tupgone=true special case enables INDEX_CLEANUP=off
style skipping of index vacuuming that takes place based on flexible,
dynamic criteria.  The INDEX_CLEANUP=off case had to know about skipping
indexes up-front before now, due to a subtle interaction with the
special case (see commit dd695979) -- this was a special case unto
itself.  Now there are no special cases.  And so now it won't matter
when or how we decide to skip index vacuuming: it won't affect how
pruning behaves, and it won't be affected by any of the implementation
details of pruning or freezing.

Also remove XLOG_HEAP2_CLEANUP_INFO records.  These are no longer
necessary because we now rely entirely on heap pruning taking care of
recovery conflicts.  There is no longer any need to generate recovery
conflicts for DEAD tuples that pruning just missed.  This also means
that heap vacuuming now uses exactly the same strategy for recovery
conflicts as index vacuuming always has: REDO routines never need to
process a latestRemovedXid from the WAL record, since earlier REDO of
the WAL record from pruning is sufficient in all cases.  The generic
XLOG_HEAP2_CLEAN record type is now split into two new record types to
reflect this new division (these are called XLOG_HEAP2_PRUNE and
XLOG_HEAP2_VACUUM).

Also stop acquiring a super-exclusive lock for heap pages when they're
vacuumed during VACUUM's second heap pass.  A regular exclusive lock is
enough.  This is correct because heap page vacuuming is now strictly a
matter of setting the LP_DEAD line pointers to LP_UNUSED.  No other
backend can have a pointer to a tuple located in a pinned buffer that
can be invalidated by a concurrent heap page vacuum operation.

Heap vacuuming can now be thought of as conceptually similar to index
vacuuming and conceptually dissimilar to heap pruning.  Heap pruning now
has sole responsibility for anything involving the logical contents of
the database (e.g., managing transaction status information, recovery
conflicts, considering what to do with HOT chains).  Index vacuuming and
heap vacuuming are now only concerned with recycling garbage items from
physical data structures that back the logical database.

Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record
changes.

Credit for the idea of retrying pruning a page to avoid the tupgone case
goes to Andres Freund.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
2021-04-06 08:49:22 -07:00
Amit Kapila ac4645c015 Allow pgoutput to send logical decoding messages.
The output plugin accepts a new parameter (messages) that controls if
logical decoding messages are written into the replication stream. It is
useful for those clients that use pgoutput as an output plugin and needs
to process messages that were written by pg_logical_emit_message().

Although logical streaming replication protocol supports logical
decoding messages now, logical replication does not use this feature yet.

Author: David Pirotte, Euler Taveira
Reviewed-by: Euler Taveira, Andres Freund, Ashutosh Bapat, Amit Kapila
Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com
2021-04-06 08:40:47 +05:30