Commit Graph

4326 Commits

Author SHA1 Message Date
Thomas Munro c6f2f01611 Fix pg_basebackup with in-place tablespaces.
Previously, pg_basebackup from a cluster that contained an 'in-place'
tablespace, as introduced by commit 7170f215, would produce a harmless
warning on Unix and fail completely on Windows.

Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20220304.165449.1200020258723305904.horikyota.ntt%40gmail.com
2022-03-15 14:01:23 +13:00
Peter Geoghegan 6e20f4600a VACUUM VERBOSE: tweak scanned_pages logic.
Commit 872770fd6c taught VACUUM VERBOSE and autovacuum logging to
display the total number of pages scanned by VACUUM.  This information
was also displayed as a percentage of rel_pages in parenthesis, which
makes it easy to spot trends over time and across tables.

The instrumentation displayed "0 scanned (0.00% of total)" for totally
empty tables.  Tweak the instrumentation: have it show "0 scanned
(100.00% of total)" for empty tables instead.  This approach is clearer
and more consistent.
2022-03-13 13:07:49 -07:00
Peter Geoghegan e370f100f0 vacuumlazy.c: Standardize rel_pages terminology.
VACUUM's rel_pages field indicates the size of the target heap rel just
after the table_relation_vacuum() operation began.  There are specific
expectations around how rel_pages can be related to other nearby state.
In particular, the range of rel_pages must contain every tuple in the
relation whose tuple headers might contain an XID < OldestXmin.

Consistently refer to the field as rel_pages to make this clearer and
more discoverable.

This is follow-up work to commit 73f6ec3d from earlier today.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220311031351.sbge5m2bpvy2ttxg@alap3.anarazel.de
2022-03-12 13:20:45 -08:00
Peter Geoghegan 73f6ec3d3c vacuumlazy.c: document vistest and OldestXmin.
Explain the relationship between vacuumlazy.c's vistest and OldestXmin
cutoffs.  These closely related cutoffs are different in subtle but
important ways.  Also document a closely related rule: we must establish
rel_pages _after_ OldestXmin to ensure that no XID < OldestXmin can be
missed by lazy_scan_heap().

It's easier to explain these issues by initializing everything together,
so consolidate initialization of vacrel state.  Now almost every vacrel
field is initialized by heap_vacuum_rel().  The only remaining exception
is the dead_items array, which is still managed by lazy_scan_heap() due
to interactions with how we initialize parallel VACUUM.

Also move the process that updates pg_class entries for each index into
heap_vacuum_rel(), and adjust related assertions.  All pg_class updates
now take place after lazy_scan_heap() returns, which seems clearer.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WznYsUxVT156rCQ+q=YD4S4=1M37hWvvHLz-H1pwSM8-Ew@mail.gmail.com
2022-03-12 12:52:38 -08:00
Peter Geoghegan 5b68f75e12 Normalize heap_prepare_freeze_tuple argument name.
We called the argument totally_frozen in its function prototype as well
as in code comments, even though totally_frozen_p was used in the
function definition.  Standardize on totally_frozen.
2022-03-11 19:30:21 -08:00
Michael Paquier e9537321a7 Add support for zstd with compression of full-page writes in WAL
wal_compression gains a new value, "zstd", to allow the compression of
full-page images using the compression method of the same name.

Compression is done using the default level recommended by the library,
as of ZSTD_CLEVEL_DEFAULT = 3.  Some benchmarking has shown that it
could make sense to use a level lower for the FPI compression, like 1 or
2, as the compression rate did not change much with a bit less CPU
consumed, but any tests done would only cover few scenarios so it is
hard to come to a clear conclusion.  Anyway, there is no reason to not
use the default level instead, which is the level recommended by the
library so it should be fine for most cases.

zstd outclasses easily pglz, and is better than LZ4 where one wants to
have more compression at the cost of extra CPU but both are good enough
in their own scenarios, so the choice between one or the other of these
comes to a study of the workload patterns and the schema involved,
mainly.

This commit relies heavily on 4035cd5, that reshaped the code creating
and restoring full-page writes to be aware of the compression type,
making this integration straight-forward.

This patch borrows some early work from Andrey Borodin, though the patch
got a complete rewrite.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220222231948.GJ9008@telsasoft.com
2022-03-11 12:18:53 +09:00
Michael Paquier 0071fc7127 Fix header inclusion order in xloginsert.c with lz4.h
Per project policy, all system and library headers need to be declared
in the backend code after "postgres.h" and before the internal headers,
but 4035cd5 broke this policy when adding support for LZ4 in
wal_compression.

Noticed while reviewing the patch to add support for zstd in this area.
This only impacts HEAD, so there is no need for a back-patch.
2022-03-11 10:59:47 +09:00
Tom Lane 46ab07ffda Clean up assorted failures under clang's -fsanitize=undefined checks.
Most of these are cases where we could call memcpy() or other libc
functions with a NULL pointer and a zero count, which is forbidden
by POSIX even though every production version of libc allows it.
We've fixed such things before in a piecemeal way, but apparently
never made an effort to try to get them all.  I don't claim that
this patch does so either, but it gets every failure I observe in
check-world, using clang 12.0.1 on current RHEL8.

numeric.c has a different issue that the sanitizer doesn't like:
"ln(-1.0)" will compute log10(0) and then try to assign the
resulting -Inf to an integer variable.  We don't actually use the
result in such a case, so there's no live bug.

Back-patch to all supported branches, with the idea that we might
start running a buildfarm member that tests this case.  This includes
back-patching c1132aae3 (Check the size in COPY_POINTER_FIELD),
which previously silenced some of these issues in copyfuncs.c.

Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
2022-03-03 18:13:24 -05:00
Michael Paquier 62ce0c758d Fix catalog data of pg_stop_backup(), labelled v2
This function has been incorrectly marked as a set-returning function
with prorows (estimated number of rows) set to 1 since its creation in
7117685, that introduced non-exclusive backups.  There is no need for
that as the function is designed to return only one tuple.

This commit fixes the catalog definition of pg_stop_backup_v2() so as it
is not marked as proretset anymore, with prorows set to 0.  This
simplifies its internals by removing one tuplestore (used for one single
record anyway) and by removing all the checks related to a set-returning
function.

Issue found during my quest to simplify some of the logic used in
in-core system functions.

Bump catalog version.

Reviewed-by: Aleksander Alekseev, Kyotaro Horiguchi
Discussion: https://postgr.es/m/Yh8guT78f1Ercfzw@paquier.xyz
2022-03-03 10:51:57 +09:00
Tom Lane 12d768e704 Don't use static storage for SaveTransactionCharacteristics().
This is pretty queasy-making on general principles, and the more so
once you notice that CommitTransactionCommand() is actually stomping
on the values saved by _SPI_commit().  It's okay as long as the
active values didn't change during HoldPinnedPortals(); but that's
a larger assumption than I think we want to make, especially since
the fix is so simple.

Discussion: https://postgr.es/m/1533956.1645731245@sss.pgh.pa.us
2022-02-28 12:54:12 -05:00
Peter Geoghegan 73c61a50a1 vacuumlazy.c: Remove obsolete num_tuples field.
Commit 49c9d9fc unified VACUUM VERBOSE and autovacuum logging.  It
neglected to remove an old vacrel field that was only used by the old
VACUUM VERBOSE, so remove it now.

The previous num_tuples approach doesn't seem to have any real advantage
over the approach VACUUM VERBOSE takes now (also the approach used by
the autovacuum logging code), which is to show new_rel_tuples.
new_rel_tuples is the possibly-estimated total number of tuples left in
the table, whereas num_tuples meant the number of tuples encountered
during the VACUUM operation, after pruning, without regard for tuples
from pages skipped via the visibility map.

In passing, reorder a related vacrel field for consistency.
2022-02-24 19:01:54 -08:00
Peter Geoghegan cf879d3069 Remove unnecessary heap_tuple_needs_freeze argument.
The buffer argument hasn't been used since the function was first added
by commit bbb6e559c4.  The sibling heap_prepare_freeze_tuple function
doesn't have such an argument either.  Remove it.
2022-02-24 18:31:07 -08:00
Heikki Linnakangas 6c46e8a5df Fix data loss on crash after sorted GiST index build.
If a checkpoint happens during sorted GiST index build, and the system
crashes after the checkpoint and after the index build has finished,
the data written to the index before the checkpoint started could be
lost. The checkpoint won't fsync it, and it won't be replayed at crash
recovery either. Fix by calling smgrimmedsync() after the index build,
just like in B-tree index build.

Backpatch to v14 where the sorted GiST index build was introduced.

Reported-by: Melanie Plageman
Discussion: https://www.postgresql.org/message-id/CAAKRu_ZJJynimxKj5xYBSziL62-iEtPE+fx-B=JzR=jUtP92mw@mail.gmail.com
2022-02-24 16:15:12 +02:00
Andres Freund 2776922201 Assert in init_toast_snapshot() that some snapshot registered or active.
Commit <FIXME> fixed the bug that RemoveTempRelationsCallback() did not
push/register a snapshot. That only went unnoticed because often a valid
catalog snapshot exists and is returned by GetOldestSnapshot(). But due to
invalidation processing that is not reliable.

Thus assert in init_toast_snapshot() that there is a registered or active
snapshot, using the new HaveRegisteredOrActiveSnapshot().

Author: Andres Freund
Discussion: https://postgr.es/m/20220219180002.6tubjq7iw7m52bgd@alap3.anarazel.de
2022-02-21 08:58:29 -08:00
Heikki Linnakangas 69639e2b5c Fix uninitialized variable.
I'm very surprised the compiler didn't warn about it. But Coverity and
Valgrind did.
2022-02-20 18:33:50 +02:00
Michael Paquier d61a361d1a Remove all traces of tuplestore_donestoring() in the C code
This routine is a no-op since dd04e95 from 2003, with a macro kept
around for compatibility purposes.  This has led to the same code
patterns being copy-pasted around for no effect, sometimes in confusing
ways like in pg_logical_slot_get_changes_guts() from logical.c where the
code was actually incorrect.

This issue has been discussed on two different threads recently, so
rather than living with this legacy, remove any uses of this routine in
the C code to simplify things.  The compatibility macro is kept to avoid
breaking any out-of-core modules that depend on it.

Reported-by: Tatsuhito Kasahara, Justin Pryzby
Author: Tatsuhito Kasahara
Discussion: https://postgr.es/m/20211217200419.GQ17618@telsasoft.com
Discussion: https://postgr.es/m/CAP0=ZVJeeYfAeRfmzqAF2Lumdiv4S4FewyBnZd4DPTrsSQKJKw@mail.gmail.com
2022-02-17 09:52:02 +09:00
Heikki Linnakangas 4620892344 Fix bogus log message when starting from a cleanly shut down state.
In commit 70e81861fa to split xlog.c, I moved the startup code that
updates the state in the control file and prints out the "database
system was not properly shut down" message to the log, but I
accidentally removed the "if (InRecovery)" check around it. As a
result, that message was printed even if the system was cleanly shut
down, also during 'initdb'.

Discussion: https://www.postgresql.org/message-id/3357075.1645031062@sss.pgh.pa.us
2022-02-16 23:15:08 +02:00
Heikki Linnakangas 9ed87a78e0 Fix read beyond buffer bug introduced by the split xlog.c patch.
FinishWalRecovery() copied the valid part of the last WAL block into a
palloc'd buffer, and the code in StartupXLOG() copied it to the WAL
buffer. But the memcpy in StartupXLOG() copied a full 8kB block, not
just the valid part, i.e. it copied from beyond the end of the buffer.
The invalid part was cleared immediately afterwards, so as long as the
memory was allocated and didn't segfault, it didn't do any harm, but
it can definitely segfault.

Discussion: https://www.postgresql.org/message-id/efc12e32-5af2-3485-5b1d-5af9f707491a@iki.fi
2022-02-16 12:01:32 +02:00
Heikki Linnakangas 70e81861fa Split xlog.c into xlog.c and xlogrecovery.c.
This moves the functions related to performing WAL recovery into the new
xlogrecovery.c source file, leaving xlog.c responsible for maintaining
the WAL buffers, coordinating the startup and switch from recovery to
normal operations, and other miscellaneous stuff that have always been in
xlog.c.

Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas
Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
2022-02-16 09:30:38 +02:00
Heikki Linnakangas be1c00ab13 Move code around in StartupXLOG().
This is in preparation for the next commit, which will split off
recovery-related code from xlog.c into a new source file. This is the
order that things will happen with the next commit, and the point of
this commit is to make these ordering changes more explicit, while the
next commit mechanically moves the source code to the new file. To aid
review, I added "BEGIN/END function" comments to mark which blocks of
code are moved to which functions in the next commit. They will be gone
in the next commit.

Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas
Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
2022-02-16 09:22:44 +02:00
Heikki Linnakangas b3a5d01c05 Refactor setting XLP_FIRST_IS_OVERWRITE_CONTRECORD.
Set it directly in CreateOverwriteContrecordRecord(). That way,
AdvanceXLInsertBuffer() doesn't need the missingContrecPtr global
variable. This is in preparation for splitting xlog.c into multiple
files.

Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/a462d79c-cb5a-47cc-e9ac-616b5003965f%40iki.fi
2022-02-16 09:22:41 +02:00
Heikki Linnakangas d231be00cb Run pgindent on xlog.c.
To tidy up after some recent refactorings in xlog.c. These would be
fixed by the pgindent run we do at the end of the development cycle,
but I want to clean these up now as I'm about to do some more big
refactorings on xlog.c.
2022-02-16 09:22:34 +02:00
Peter Geoghegan 988ffc3063 Update "don't truncate with failsafe" rationale.
There is a very good (though non-obvious) reason to avoid relation
truncation during a VACUUM that has triggered the failsafe mechanism,
which was missed before now.  Update related comments, so this isn't
forgotten.

Reported-By: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/CAFBsxsFiMPxQ-dHZ8tOgktn=+ffeJT3+GinZ4zdOGbmAnCYadA@mail.gmail.com
2022-02-15 15:16:19 -08:00
Amit Kapila 5e01001ffb WAL log unchanged toasted replica identity key attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.

Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera, Haiying Tang, Andres Freund
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
2022-02-14 08:55:58 +05:30
Michael Paquier c963e84fb8 Make origin data initialization consistent other fields in 2PC header
As of 1eb6d65, the origin data is optionally stored in a 2PC file
header, with the data filled in EndPrepare() even in the default case
where there is no origin data to add.  This was inconsistent with all
the other fields of TwoPhaseFileHeader which are initialized in
StartPrepare(), so move the initialization of origin_lsn and
origin_timestamp there instead.  The effect of missing the
initialization at this early stage is only cosmetic based on the current
logic of the code, but could have led to issues in the long-term, and it
is more consistent done this way.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAooECJ+gU_RZB-yhioPOV94R4ucoHAf68PiJhLpgpVpBw@mail.gmail.com
2022-02-14 09:30:35 +09:00
Tom Lane 302612a6c7 Silence minor compiler warnings.
Depending on compiler version and optimization level, we might
get a complaint that lazy_scan_heap's "freespace" is used
uninitialized.

Compilers not aware that ereport(ERROR) doesn't return complained
about bbsink_lz4_new().

Assigning "-1" to a uint64 value has unportable results; fortunately,
the value of xlogreadsegno is unimportant when xlogreadfd is -1.
(It looks to me like there is no need for xlogreadsegno to be static
in the first place, but I didn't venture to change that.)
2022-02-13 13:06:55 -05:00
Peter Geoghegan efa4a9462a Consolidate VACUUM xid cutoff logic.
Push the logic for determining whether or not a VACUUM operation will be
aggressive down into vacuum_set_xid_limits().  This makes the function's
signature significantly simpler, and seems clearer overall.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
2022-02-11 18:26:15 -08:00
Peter Geoghegan 872770fd6c Add VACUUM instrumentation for scanned pages, relfrozenxid.
Report on scanned pages within VACUUM VERBOSE and autovacuum logging.
These are pages that were physically examined during the VACUUM
operation.  Note that this can include a small number of pages that were
marked all-visible in the visibility map by some earlier VACUUM
operation.  VACUUM won't skip all-visible pages that aren't part of a
range of all-visible pages that's at least 32 blocks in length (partly
to avoid missing out on opportunities to advance relfrozenxid during
non-aggressive VACUUMs).

Commit 44fa8488 simplified the definition of scanned pages.  It became
the complement of the pages (of those pages from rel_pages) that were
skipped using the visibility map.  And so scanned pages precisely
indicates how effective the visibility map was at saving work.  (Before
now we displayed the number of pages skipped via the visibility map when
happened to be frozen pages, but not when they were merely all-visible,
which was less useful to users.)

Rename the user-visible OldestXmin output field to "removal cutoff", and
show some supplementary information: how far behind the cutoff is
(number of XIDs behind) by the time the VACUUM operation finished.  This
will help users to figure out what's _not_ working in extreme cases
where VACUUM is fundamentally unable to remove dead tuples or freeze
older tuples (e.g., due to a leaked replication slot).  Also report when
relfrozenxid is advanced by VACUUM in output that immediately follows
"removal cutoff".  This structure is intended to highlight the
relationship between the new relfrozenxid value for the table, and the
VACUUM operation's removal cutoff.

Finally, add instrumentation of "missed dead tuples", and the number of
pages that had at least one such tuple.  These are fully DEAD (not just
RECENTLY_DEAD) tuples with storage that could not be pruned due to
failure to acquire a cleanup lock on a heap page.  This is a replacement
for the "skipped due to pin" instrumentation removed by commit 44fa8488.
It shows more details than before for pages where failing to get a
cleanup lock actually resulted in VACUUM missing out on useful work, but
usually shows nothing at all instead (the mere fact that we couldn't get
a cleanup lock is usually of no consequence whatsoever now).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com
2022-02-11 16:48:40 -08:00
Peter Geoghegan 44fa84881f Simplify lazy_scan_heap's handling of scanned pages.
Redefine a scanned page as any heap page that actually gets pinned by
VACUUM's first pass over the heap, regardless of whether or not the page
was cleanup locked.  Although it's fundamentally impossible to prune a
heap page without a cleanup lock (since we cannot safely defragment the
page), we can do just about everything else.  The only notable further
exception is freezing tuples, though even that is arguably a consequence
of not being able to prune (not a separate issue).

VACUUM now does as much of the same processing as possible for pages
that could not be cleanup locked.  Any failure to do specific required
processing is treated as a special case exception, which will be rare in
practice.  We now collect any preexisting LP_DEAD items (left behind by
earlier opportunistic pruning) in the dead_items array for these heap
pages, and count their tuples in the usual way.  Steps used to decide if
we'll attempt relation truncation are performed in the usual way for
no-cleanup-lock scanned pages, too.

Although eliminating these special cases is intrinsically useful, it's
even more useful as an enabler of further simplifications.  The only
essential difference between aggressive and non-aggressive is that only
aggressive is _guaranteed_ to be able to advance relfrozenxid up to
FreezeLimit.  Advancing relfrozenxid is always useful, but before now
non-aggressive VACUUMs threw away the opportunity to do so whenever a
cleanup lock could not be acquired on any page, no matter what the
details were.  This was very pessimistic.

It isn't actually necessary to "behave aggressively" to maintain the
ability to advance relfrozenxid when a cleanup lock isn't immediately
available (most of the time).  The non-aggressive case will now make
sure that it isn't safe to advance relfrozenxid (without waiting) using
only a share lock.  It will usually notice that there are no tuples that
need to be frozen anyway, just like in the aggressive case -- and so it
no longer wastes an opportunity to advance relfrozenxid over nothing.
(The non-aggressive case still won't wait for a cleanup lock when there
really are tuples on the page that need to be frozen, since that really
would amount to "behaving aggressively".)

VACUUM currently has a tendency to set heap pages to all-visible in the
visibility map before it freezes all of the tuples on the page.  Only a
subsequent aggressive VACUUM will visit these pages to freeze their
tuples, usually only when the tuple XIDs are much older than the
vacuum_freeze_min_age GUC (FreezeLimit cutoff) is supposed to allow.
And so non-aggressive VACUUMs are still far less likely to be able to
advance relfrozenxid in practice, even with the enhancements from this
commit.  This remaining issue will be addressed by future work that
overhauls the criteria for freezing tuples.  Once that's in place,
almost every VACUUM operation will be able to advance relfrozenxid in
practice.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com
2022-02-11 14:32:17 -08:00
Michael Paquier 0147fc7c8c Fix typo in multixact.c
Introduced in aa64f23.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20220209175338.GB1627503@nathanxps13
2022-02-10 10:45:14 +09:00
Robert Haas aa64f23b02 Remove MaxBackends variable in favor of GetMaxBackends() function.
Previously, it was really easy to write code that accessed MaxBackends
before we'd actually initialized it, especially when coding up an
extension. To make this less error-prune, introduce a new function
GetMaxBackends() which should be used to obtain the correct value.
This will ERROR if called too early. Demote the global variable to
a file-level static, so that nobody can peak at it directly.

Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane,
by Michael Paquier (who had doubts about the approach), and by me.

Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
2022-02-08 15:53:19 -05:00
Alexander Korotkov f1ea98a797 Reduce non-leaf keys overlap in GiST indexes produced by a sorted build
The GiST sorted build currently chooses split points according to the only page
space utilization.  That may lead to higher non-leaf keys overlap and, in turn,
slower search query answers.

This commit makes the sorted build use the opclass's picksplit method.  Once
four pages at the level are accumulated, the picksplit method is applied until
each split partition fits the page.  Some of our split algorithms could show
significant performance degradation while processing 4-times more data at once.
But those opclasses haven't received the sorted build support and shouldn't
receive it before their split algorithms are improved.

Discussion: https://postgr.es/m/CAHqSB9jqtS94e9%3D0vxqQX5dxQA89N95UKyz-%3DA7Y%2B_YJt%2BVW5A%40mail.gmail.com
Author: Aliaksandr Kalenik, Sergei Shoulbakov, Andrey Borodin
Reviewed-by: Björn Harrtell, Darafei Praliaskouski, Andres Freund
Reviewed-by: Alexander Korotkov
2022-02-07 23:20:42 +03:00
Robert Haas 5ef1eefd76 Allow archiving via loadable modules.
Running a shell command for each file to be archived has a lot of
overhead and may not offer as much error checking as you want, or the
exact semantics that you want. So, offer the option to call a loadable
module for each file to be archived, rather than running a shell command.

Also, add a 'basic_archive' contrib module as an example implementation
that archives to a local directory.

Nathan Bossart, with a little bit of kibitzing by me.

Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
2022-02-03 14:05:02 -05:00
Peter Eisentraut 94aa7cc5f7 Add UNIQUE null treatment option
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.

Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
2022-02-03 11:48:21 +01:00
Alvaro Herrera b3d7d6e462
Remove xloginsert.h from xlog.h
xlog.h is directly and indirectly #included in a lot of places.  With
this change, xloginsert.h is no longer unnecessarily included in the
large number of them that don't need it.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
2022-01-30 12:25:24 -03:00
Peter Geoghegan bf42fcace5 vacuumlazy.c: Rename state field for consistency.
Rename pages_removed to removed_pages, for consistency with nearby
vacrel fields.
2022-01-28 17:41:09 -08:00
Michael Paquier 741bd32933 Improve errors related to incorrect TLI on checkpoint record replay
WAL replay would cause a hard crash if the timeline expected by a
XLOG_END_OF_RECOVERY, a XLOG_CHECKPOINT_ONLINE, or a
XLOG_CHECKPOINT_SHUTDOWN record is not the same as the timeline being
replayed, using the same error message for all three of them.  This
commit changes those error messages to use different wordings, adapted
to each record type, which is useful when it comes to the debugging of
an issue in this area.

Author: Amul Sul
Reviewed-by: Nathan Bossart, Robert Haas
Discussion: https://postgr.es/m/CAAJ_b97i1ZerYC_xW6o_AiDSW5n+sGi8k91Yc8KS8bKWKxjqwQ@mail.gmail.com
2022-01-25 13:37:19 +09:00
Michael Paquier 410aa248e5 Fix various typos, grammar and code style in comments and docs
This fixes a set of issues that have accumulated over the past months
(or years) in various code areas.  Most fixes are related to some recent
additions, as of the development of v15.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
2022-01-25 09:40:04 +09:00
Andres Freund 1fabec7d7c fsync pg_logical/mappings in CheckPointLogicalRewriteHeap().
While individual logical rewrite files were synced to disk, the directory was
not. On some filesystems that could lead to loosing directory entries after a
crash.

Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/867F2E29-2782-4869-970E-B984C6D35A8F@amazon.com
Backpatch: 10-
2022-01-21 11:22:55 -08:00
Michael Paquier 237d1f3172 Fix one-off bug causing missing commit timestamps for subtransactions
The logic in charge of writing commit timestamps (enabled with
track_commit_timestamp) for subtransactions had a one-bug bug,
where it would be possible that commit timestamps go missing for the
last subtransaction committed.

While on it, simplify a bit the iteration logic in the loop writing the
commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom
Lane, so as some variable initializations are not part of the loop
itself.

Issue introduced in 73c986a.

Analyzed-by: Alex Kingsborough
Author: Alex Kingsborough, Kyotaro Horiguchi
Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com
Backpatch-through: 10
2022-01-21 14:54:04 +09:00
Peter Eisentraut b99ccd2cb2 Call pg_newlocale_from_collation() also with default collation
Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0.  Instead, now we call it anyway and have it return
0 if the default collation was passed.  It already did this, so we
just have to adjust the callers.  This simplifies all the call sites
and also makes future enhancements easier.

After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
2022-01-20 09:50:18 +01:00
Jeff Davis 7a5f6b4748 Make logical decoding a part of the rmgr.
Add a new rmgr method, rm_decode, and use that rather than a switch
statement.

In preparation for rmgr extensibility.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
2022-01-19 14:58:49 -08:00
Andres Freund c702d656a2 heap pruning: Only call BufferGetBlockNumber() once.
BufferGetBlockNumber() is not that cheap and obviously cannot change during
one heap_prune_page(), so only call it once. We might be able to do better and
pass the block number from the caller, but that'd be a larger change...

Discussion: https://postgr.es/m/20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de
2022-01-17 15:35:11 -08:00
Peter Geoghegan 49c9d9fcfa Unify VACUUM VERBOSE and autovacuum logging.
The log_autovacuum_min_duration instrumentation used its own dedicated
code for logging, which was not reused by VACUUM VERBOSE.  This was
highly duplicative, and sometimes led to each code path using slightly
different accounting for essentially the same information.

Clean things up by making VACUUM VERBOSE reuse the same instrumentation
code.  This code restructuring changes the structure of the VACUUM
VERBOSE output itself, but that seems like an overall improvement.  The
most noticeable change in VACUUM VERBOSE output is that it no longer
outputs a distinct message per index per round of index vacuuming.  Most
of the same information (about each index) is now shown in its new
per-operation summary message.  This is far more legible.

A few details are no longer displayed by VACUUM VERBOSE, but that's no
real loss in practice, especially in the common case where we don't need
multiple index scans/rounds of vacuuming.  This super fine-grained
information is still available via DEBUG2 messages, which might still be
useful in debugging scenarios.

VACUUM VERBOSE now shows new instrumentation, which is typically very
useful: all of the log_autovacuum_min_duration instrumentation that it
missed out on before now.  This includes information about WAL overhead,
buffers hit/missed/dirtied information, and I/O timing information.

VACUUM VERBOSE still retains a few INFO messages of its own.  This is
limited to output concerning the progress of heap rel truncation, as
well as some basic information about parallel workers.  These details
are still potentially quite useful.  They aren't a good fit for the log
output, which must summarize the whole operation.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzmW4Me7_qR4X4ka7pxP-jGmn7=Npma_-Z-9Y1eD0MQRLw@mail.gmail.com
2022-01-14 16:50:34 -08:00
Andres Freund bb42bfb5cc Assert redirect pointers are sensible after heap_page_prune().
Corruption of redirect item pointers often only becomes visible well after
being corrupted, as e.g. bug #17255 shows: In the original reproducer,
gigabyte of WAL were between the source of the corruption and the corruption
becoming visible.

To make it easier to find / prevent such bugs, verify whether redirect
pointers are sensible at the end of heap_page_prune_execute(). 5cd7eb1f1c
introduced related assertions while modifying the page, but they can't easily
detect marking the target of an existing redirect as unused. Sometimes the
corruption will be detected later, but that's harder to diagnose.

Author: Andres Freund <andres@andres@anarazel.de>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb@alap3.anarazel.de
2022-01-13 18:14:05 -08:00
Andres Freund 18b87b201f Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning.
Since dc7420c2c9 the horizon used for pruning is determined "lazily". A more
accurate horizon is built on-demand, rather than in GetSnapshotData(). If a
horizon computation is triggered between two HeapTupleSatisfiesVacuum() calls
for the same tuple, the result can change from RECENTLY_DEAD to DEAD.

heap_page_prune() can process the same tid multiple times (once following an
update chain, once "directly"). When the result of HeapTupleSatisfiesVacuum()
of a tuple changes from RECENTLY_DEAD during the first access, to DEAD in the
second, the "tuple is DEAD and doesn't chain to anything else" path in
heap_prune_chain() can end up marking the target of a LP_REDIRECT ItemId
unused.

Initially not easily visible,
Once the target of a LP_REDIRECT ItemId is marked unused, a new tuple version
can reuse it. At that point the corruption may become visible, as index
entries pointing to the "original" redirect item, now point to a unrelated
tuple.

To fix, compute HTSV for all tuples on a page only once. This fixes the entire
class of problems of HTSV changing inside heap_page_prune(). However,
visibility changes can obviously still occur between HTSV checks inside
heap_page_prune() and outside (e.g. in lazy_scan_prune()).

The computation of HTSV is now done in bulk, in heap_page_prune(), rather than
on-demand in heap_prune_chain(). Besides being a bit simpler, it also is
faster: Memory accesses can happen sequentially, rather than in the order of
HOT chains.

There are other causes of HeapTupleSatisfiesVacuum() results changing between
two visibility checks for the same tuple, even before dc7420c2c9. E.g.
HEAPTUPLE_INSERT_IN_PROGRESS can change to HEAPTUPLE_DEAD when a transaction
aborts between the two checks. None of the these other visibility status
changes are known to cause corruption, but heap_page_prune()'s approach makes
it hard to be confident.

A patch implementing a more fundamental redesign of heap_page_prune(), which
fixes this bug and simplifies pruning substantially, has been proposed by
Peter Geoghegan in
https://postgr.es/m/CAH2-WzmNk6V6tqzuuabxoxM8HJRaWU6h12toaS-bqYcLiht16A@mail.gmail.com

However, that redesign is larger change than desirable for backpatching. As
the new design still benefits from the batched visibility determination
introduced in this commit, it makes sense to commit this narrower fix to 14
and master, and then commit Peter's improvement in master.

The precise sequence required to trigger the bug is complicated and hard to do
exercise in an isolation test (until we have wait points). Due to that the
isolation test initially posted at
https://postgr.es/m/20211119003623.d3jusiytzjqwb62p%40alap3.anarazel.de
and updated in
https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb%40alap3.anarazel.de
isn't committable.

A followup commit will introduce additional assertions, to detect problems
like this more easily.

Bug: #17255
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Debugged-By: Andres Freund <andres@anarazel.de>
Debugged-By: Peter Geoghegan <pg@bowt.ie>
Author: Andres Freund <andres@andres@anarazel.de>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb@alap3.anarazel.de
Backpatch: 14-, the oldest branch containing dc7420c2c9
2022-01-13 18:13:41 -08:00
Peter Geoghegan e9b873f667 vacuumlazy.c: fix "garbage tuples" reference.
Another minor oversight in commit 4f8d9d12.
2022-01-12 14:13:35 -08:00
Amit Kapila dbfa1022e4 Fix typo in rewriteheap.c.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACW7SvfFW8r2uKH6oQm1kNpt8aQMG61kSBPK0S2PHhFbMw@mail.gmail.com
2022-01-11 10:50:18 +05:30
Peter Eisentraut ee41960738 Rename functions to avoid future conflicts
Rename range_serialize/range_deserialize to
brin_range_serialize/brin_range_deserialize, since there are already
public range_serialize/range_deserialize in rangetypes.h.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/CA+renyX0ipvY6A_jUOHeB1q9mL4bEYfAZ5FBB7G7jUo5bykjrA@mail.gmail.com
2022-01-10 09:37:43 +01:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00