Commit Graph

662 Commits

Author SHA1 Message Date
Tom Lane 0245f8db36 Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

This set of diffs is a bit larger than typical.  We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop).  We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up.  Going
forward, that should make for fewer random-seeming changes to existing
code.

Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19 17:24:48 -04:00
Michael Paquier 8961cb9a03 Fix typos in comments
The changes done in this commit impact comments with no direct
user-visible changes, with fixes for incorrect function, variable or
structure names.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
2023-05-02 12:23:08 +09:00
Peter Geoghegan e944063294 Fix xl_heap_lock WAL record field's data type.
Make xl_heap_lock's infobits_set field of type uint8, not int8.  Using
int8 isn't appropriate given that the field just holds status bits.
This fixes an oversight in commit 0ac5ad5134.

In passing rename the nearby TransactionId field to "xmax" to make
things consistency with related records, such as xl_heap_lock_updated.

Deliberately avoid a bump in XLOG_PAGE_MAGIC.  No backpatch, either.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkCd3kOS8b7Rfxw7Mh1_6jvX=Nzo-CWR1VBTiOtVZkWHA@mail.gmail.com
2023-04-11 14:07:54 -07:00
Andres Freund 26669757b6 Handle logical slot conflicts on standby
During WAL replay on the standby, when a conflict with a logical slot is
identified, invalidate such slots. There are two sources of conflicts:
1) Using the information added in 6af1793954, logical slots are invalidated if
   required rows are removed
2) wal_level on the primary server is reduced to below logical

Uses the infrastructure introduced in the prior commit. FIXME: add commit
reference.

Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to
interrupt use of a slot, if called in the startup process. The new recovery
conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot.

See 6af1793954 for an overall design of logical decoding on a standby.

Bumps catversion for the addition of the pg_stat_database_conflicts column.
Bumps PGSTAT_FILE_FORMAT_ID for the same reason.

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de
2023-04-08 00:05:44 -07:00
Andres Freund 00d1e02be2 hio: Use ExtendBufferedRelBy() to extend tables more efficiently
While we already had some form of bulk extension for relations, it was fairly
limited. It only amortized the cost of acquiring the extension lock, the
relation itself was still extended one-by-one. Bulk extension was also solely
triggered by contention, not by the amount of data inserted.

To address this, use ExtendBufferedRelBy(), introduced in 31966b151e, to
extend the relation. We try to extend the relation by multiple blocks in two
situations:

1) The caller tells RelationGetBufferForTuple() that it will need multiple
   pages. For now that's only used by heap_multi_insert(), see commit FIXME.

2) If there is contention on the extension lock, use the number of waiters for
   the lock as a multiplier for the number of blocks to extend by. This is
   similar to what we already did. Previously we additionally multiplied the
   numbers of waiters by 20, but with the new relation extension
   infrastructure I could not see a benefit in doing so.

Using the freespacemap to provide empty pages can cause significant
contention, and adds measurable overhead, even if there is no contention. To
reduce that, remember the blocks the relation was extended by in the
BulkInsertState, in the extending backend. In case 1) from above, the blocks
the extending backend needs are not entered into the FSM, as we know that we
will need those blocks.

One complication with using the FSM to record empty pages, is that we need to
insert blocks into the FSM, when we already hold a buffer content lock. To
avoid doing IO while holding a content lock, release the content lock before
recording free space. Currently that opens a small window in which another
backend could fill the block, if a concurrent VACUUM records the free
space. If that happens, we retry, similar to the already existing case when
otherBuffer is provided. In the future it might be worth closing the race by
preventing VACUUM from recording the space in newly extended pages.

This change provides very significant wins (3x at 16 clients, on my
workstation) for concurrent COPY into a single relation. Even single threaded
COPY is measurably faster, primarily due to not dirtying pages while
extending, if supported by the operating system (see commit 4d330a61bb). Even
single-row INSERTs benefit, although to a much smaller degree, as the relation
extension lock rarely is the primary bottleneck.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
2023-04-06 16:53:17 -07:00
Andres Freund 5279e9db8e heapam: Pass number of required pages to RelationGetBufferForTuple()
A future commit will use this information to determine how aggressively to
extend the relation by. In heap_multi_insert() we know accurately how many
pages we need once we need to extend the relation, providing an accurate lower
bound for how much to extend.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
2023-04-06 16:17:16 -07:00
Andres Freund 6af1793954 Add info in WAL records in preparation for logical slot conflict handling
This commit only implements one prerequisite part for allowing logical
decoding. The commit message contains an explanation of the overall design,
which later commits will refer back to.

Overall design:

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby. To prevent those errors, a new replication
conflict scenario needs to be addressed (as much as hot standby does).

2. Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.

3. To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access. That way, during WAL replay, we know when there is a risk of
conflict and, if so, if there is a conflict.

4. We can't rely on the standby's relcache entries for this purpose in
any way, because the startup process can't access catalog contents.

5. Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.

Why do we need this for logical decoding on standby?

First, let's forget about logical decoding on standby and recall that
on a primary database, any catalog rows that may be needed by a logical
decoding replication slot are not removed.

This is done thanks to the catalog_xmin associated with the logical
replication slot.

But, with logical decoding on standby, in the following cases:

- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between
  the primary and the standby. Then, hot_standby_feedback will work,
  but only while the connection is alive (for example a node restart
  would break it)

Then, the primary may delete system catalog rows that could be needed
by the logical decoding on the standby (as it does not know about the
catalog_xmin on the standby).

So, it’s mandatory to identify those rows and invalidate the slots
that may need them if any. Identifying those rows is the purpose of
this commit.

Implementation:

When a WAL replay on standby indicates that a catalog table tuple is
to be deleted by an xid that is greater than a logical slot's
catalog_xmin, then that means the slot's catalog_xmin conflicts with
the xid, and we need to handle the conflict. While subsequent commits
will do the actual conflict handling, this commit adds a new field
isCatalogRel in such WAL records (and a new bit set in the
xl_heap_visible flags field), that is true for catalog tables, so as to
arrange for conflict handling.

The affected WAL records are the ones that already contain the
snapshotConflictHorizon field, namely:

- gistxlogDelete
- gistxlogPageReuse
- xl_hash_vacuum_one_page
- xl_heap_prune
- xl_heap_freeze_page
- xl_heap_visible
- xl_btree_reuse_page
- xl_btree_delete
- spgxlogVacuumRedirect

Due to this new field being added, xl_hash_vacuum_one_page and
gistxlogDelete do now contain the offsets to be deleted as a
FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment.
It's not needed on the others struct where isCatalogRel has
been added.

This commit just introduces the WAL format changes mentioned above. Handling
the actual conflicts will follow in future commits.

Bumps XLOG_PAGE_MAGIC as the several WAL records are changed.

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de> (in an older version)
Author: Amit Khandekar <amitdkhan.pg@gmail.com>  (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
2023-04-02 12:32:19 -07:00
Andres Freund 61b313e47e Pass down table relation into more index relation functions
This is done in preparation for logical decoding on standby, which needs to
include whether visibility affecting WAL records are about a (user) catalog
table. Which is only known for the table, not the indexes.

It's also nice to be able to pass the heap relation to GlobalVisTestFor() in
vacuumRedirectAndPlaceholder().

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com
2023-04-01 20:18:29 -07:00
Peter Geoghegan ae4fdde135 Count updates that move row to a new page.
Add pgstat counter to track row updates that result in the successor
version going to a new heap page, leaving behind an original version
whose t_ctid points to the new version.  The current count is shown by
the n_tup_newpage_upd column of each of the pg_stat_*_tables views.

The new n_tup_newpage_upd column complements the existing n_tup_hot_upd
and n_tup_upd columns.  Tables that have high n_tup_newpage_upd values
(relative to n_tup_upd) are good candidates for tuning heap fillfactor.

Corey Huinker, with small tweaks by me.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CADkLM=ded21M9iZ36hHm-vj2rE2d=zcKpUQMds__Xm2pxLfHKA@mail.gmail.com
2023-03-23 11:16:17 -07:00
Tomas Vondra 19d8e2308b Ignore BRIN indexes when checking for HOT updates
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed by block summarizing indexes without
references to individual tuples that need to be cleaned up.

A new type TU_UpdateIndexes provides a signal to the executor to
determine which indexes to update - no indexes, all indexes, or only the
summarizing indexes.

This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.

This was originally committed as 5753d4ee32, but then got reverted by
e3fcca0d0d because of correctness issues.

Original patch by Josef Simanek, various fixes and improvements by Tomas
Vondra and me.

Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
2023-03-20 11:02:42 +01:00
Tom Lane 462bb7f128 Remove bms_first_member().
This function has been semi-deprecated ever since we invented
bms_next_member().  Its habit of scribbling on the input bitmapset
isn't great, plus for sufficiently large bitmapsets it would take
O(N^2) time to complete a loop.  Now we have the additional problem
that reducing the input to empty while leaving it still accessible
would violate a planned invariant.  So let's just get rid of it,
after updating the few extant callers to use bms_next_member().

Patch by me; thanks to Nathan Bossart and Richard Guo for review.

Discussion: https://postgr.es/m/1159933.1677621588@sss.pgh.pa.us
2023-03-02 11:34:29 -05:00
David Rowley cfcf56f923 More refactoring of heapgettup() and heapgettup_pagemode()
Here we further simplify the code in heapgettup() and
heapgettup_pagemode() to make better use of the helper functions added in
the previous recent refactors in this area.

In passing, remove an unneeded cast added in 8ca6d49f6.

Author: Melanie Plageman
Reviewed-by: Andres Freund, David Rowley
Discussion: https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2023-02-07 17:24:07 +13:00
David Rowley 7ae0ab0ad9 Reduce code duplication between heapgettup and heapgettup_pagemode
The code to get the next block number was exactly the same between these
two functions, so let's just put it into a helper function and call that
from both locations.

Author: Melanie Plageman
Reviewed-by: Andres Freund, David Rowley
Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
2023-02-03 16:20:43 +13:00
David Rowley 8ca6d49f63 Add helper functions to simplify heapgettup code
Here we add heapgettup_start_page() and heapgettup_continue_page() to
simplify the code in the heapgettup() function.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
2023-02-03 12:38:42 +13:00
David Rowley f9bc34fcb6 Further refactor of heapgettup and heapgettup_pagemode
Backward and forward scans share much of the same page acquisition code.
Here we consolidate that code to reduce some duplication.

Additionally, add a new rs_coffset field to HeapScanDescData to track the
offset of the current tuple.  The new field fits nicely into the padding
between a bool and BlockNumber field and saves having to look at the last
returned tuple to figure out which offset we should be looking at for the
current tuple.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
2023-02-03 11:48:39 +13:00
David Rowley fb1a59de0c Refactor heapam.c adding heapgettup_initial_block function
Here we adjust heapgettup() and heapgettup_pagemode() to move the code
that fetches the first block number to scan out into a helper function.
This removes some code duplication.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
2023-02-02 14:17:15 +13:00
David Rowley e9aaf06328 Remove dead NoMovementScanDirection code
Here remove some dead code from heapgettup() and heapgettup_pagemode()
which was trying to support NoMovementScanDirection scans.  This code can
never be reached as standard_ExecutorRun() never calls ExecutePlan with
NoMovementScanDirection.

Additionally, plans which were scanning an unordered index would use
NoMovementScanDirection rather than ForwardScanDirection.  There was no
real need for this, so here we adjust this so we use ForwardScanDirection
for unordered index scans.  A comment in pathnodes.h claimed that
NoMovementScanDirection was used for PathKey reasons, but if that was
true, it no longer is, per code in build_index_paths().

This does change the non-text format of the EXPLAIN output so that
unordered index scans now have a "Forward" scan direction rather than
"NoMovement".  The text format of EXPLAIN has not changed.

Author: Melanie Plageman
Reviewed-by: Tom Lane, David Rowley
Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
2023-02-01 10:52:41 +13:00
Peter Geoghegan 6c6b497266 Revert "Add eager and lazy freezing strategies to VACUUM."
This reverts commit 4d41799261.  Broad
concerns about regressions caused by eager freezing strategy have been
raised.  Whether or not these concerns can be worked through in any time
frame is far from certain.

Discussion: https://postgr.es/m/20230126004347.gepcmyenk2csxrri@awork3.anarazel.de
2023-01-25 22:22:27 -08:00
Peter Geoghegan 4d41799261 Add eager and lazy freezing strategies to VACUUM.
Eager freezing strategy avoids large build-ups of all-visible pages.  It
makes VACUUM trigger page-level freezing whenever doing so will enable
the page to become all-frozen in the visibility map.  This is useful for
tables that experience continual growth, particularly strict append-only
tables such as pgbench's history table.  Eager freezing significantly
improves performance stability by spreading out the cost of freezing
over time, rather than doing most freezing during aggressive VACUUMs.
It complements the insert autovacuum mechanism added by commit b07642db.

VACUUM determines its freezing strategy based on the value of the new
vacuum_freeze_strategy_threshold GUC (or reloption) with logged tables.
Tables that exceed the size threshold use the eager freezing strategy.
Unlogged tables and temp tables always use eager freezing strategy,
since the added cost is negligible there.  Non-permanent relations won't
incur any extra overhead in WAL written (for the obvious reason), nor in
pages dirtied (since any extra freezing will only take place on pages
whose PD_ALL_VISIBLE bit needed to be set either way).

VACUUM uses lazy freezing strategy for logged tables that fall under the
GUC size threshold.  Page-level freezing triggers based on the criteria
established in commit 1de58df4, which added basic page-level freezing.

Eager freezing is strictly more aggressive than lazy freezing.  Settings
like vacuum_freeze_min_age still get applied in just the same way in
every VACUUM, independent of the strategy in use.  The only mechanical
difference between eager and lazy freezing strategies is that only the
former applies its own additional criteria to trigger freezing pages.
Note that even lazy freezing strategy will trigger freezing whenever a
page happens to have required that an FPI be written during pruning,
provided that the page will thereby become all-frozen in the visibility
map afterwards (due to the FPI optimization from commit 1de58df4).

The vacuum_freeze_strategy_threshold default setting is 4GB.  This is a
relatively low setting that prioritizes performance stability.  It will
be reviewed at the end of the Postgres 16 beta period.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
2023-01-25 14:15:38 -08:00
David Rowley 009dbdea02 Run pgindent on heapam.c
An upcoming patch by Melanie Plageman does some refactoring work in this
area.  Run pgindent on that file now before making any changes so that
it's easier to maintain/evolve each of the individual patches doing the
refactor work.  Additionally, add a few new required typedefs to the list
to make it easier to do future pgindent runs on this file during the
refactor work.

Discussion: https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2023-01-23 23:08:38 +13:00
Peter Geoghegan 50767705ed Rename and relocate freeze plan dedup routines.
Rename the heapam.c freeze plan deduplication routines added by commit
9e540599 to names that follow conventions for functions in heapam.c.
Also relocate the functions so that they're next to their caller, which
runs during original execution, when FREEZE_PAGE WAL records are built.

The routines were initially placed next to (and followed the naming
conventions of) conceptually related REDO routine code, but that scheme
turned out to be kind of jarring when considered in a wider context.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230109214308.icz26oqvt3k2274c@awork3.anarazel.de
2023-01-11 17:30:42 -08:00
Peter Geoghegan eb5ad4ff05 Check that xmax didn't commit in freeze check.
We cannot rely on TransactionIdDidAbort here, since in general it may
report transactions that were in-progress at the time of an earlier hard
crash as not aborted, effectively behaving as if they were still in
progress even after crash recovery completes.  Go back to defensively
verifying that xmax didn't commit instead.

Oversight in commit 79d4bf4e.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230104035636.hy5djyr2as4gbc4q@awork3.anarazel.de
2023-01-03 21:48:27 -08:00
Peter Geoghegan 79d4bf4eff Delay commit status checks until freezing executes.
pg_xact lookups are relatively expensive.  Move the xmin/xmax commit
status checks from the point that freeze plans are prepared to the point
that they're actually executed.  Otherwise we'll repeat many commit
status checks whenever multiple successive VACUUM operations scan the
same pages and decide against freezing each time, which is a waste of
cycles.

Oversight in commit 1de58df4, which added page-level freezing.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com
2023-01-03 11:22:36 -08:00
Bruce Momjian c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Peter Eisentraut e351f85418 Push lpp variable closer to usage in heapgetpage()
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2023-01-02 09:39:31 +01:00
Peter Geoghegan 1de58df4fe Add page-level freezing to VACUUM.
Teach VACUUM to decide on whether or not to trigger freezing at the
level of whole heap pages.  Individual XIDs and MXIDs fields from tuple
headers now trigger freezing of whole pages, rather than independently
triggering freezing of each individual tuple header field.

Managing the cost of freezing over time now significantly influences
when and how VACUUM freezes.  The overall amount of WAL written is the
single most important freezing related cost, in general.  Freezing each
page's tuples together in batch allows VACUUM to take full advantage of
the freeze plan WAL deduplication optimization added by commit 9e540599.

Also teach VACUUM to trigger page-level freezing whenever it detects
that heap pruning generated an FPI.  We'll have already written a large
amount of WAL just to do that much, so it's very likely a good idea to
get freezing out of the way for the page early.  This only happens in
cases where it will directly lead to marking the page all-frozen in the
visibility map.

In most cases "freezing a page" removes all XIDs < OldestXmin, and all
MXIDs < OldestMxact.  It doesn't quite work that way in certain rare
cases involving MultiXacts, though.  It is convenient to define "freeze
the page" in a way that gives FreezeMultiXactId the leeway to put off
the work of processing an individual tuple's xmax whenever it happens to
be a MultiXactId that would require an expensive second pass to process
aggressively (allocating a new multi is especially worth avoiding here).
FreezeMultiXactId is eager when processing is cheap (as it usually is),
and lazy in the event of an individual multi that happens to require
expensive second pass processing.  This avoids regressions related to
processing of multis that page-level freezing might otherwise cause.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
2022-12-28 08:50:47 -08:00
Peter Geoghegan 63c844a0a5 Remove overzealous MultiXact freeze assertion.
When VACUUM determines that an existing MultiXact should use a freeze
plan that sets xmax to InvalidTransactionId, the original Multi may or
may not be before OldestMxact.  Remove an incorrect assertion that
expected it to always be from before OldestMxact.

Oversight in commit 4ce3af.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TYAPR01MB5866B24104FD80B5D7E65C3EF5ED9@TYAPR01MB5866.jpnprd01.prod.outlook.com
2022-12-26 23:36:02 -08:00
Peter Geoghegan 4ce3afb82e Refactor how VACUUM passes around its XID cutoffs.
Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as
FreezeLimit and OldestXmin.  This state is initialized in vacuum.c, and
then passed around by code from vacuumlazy.c to heapam.c freezing
related routines.  The new convention is that everybody works off of the
same cutoff state, which is passed around via pointers to const.

Also simplify some of the logic for dealing with frozen xmin in
heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to
clearly distinguish xmin XIDs that we're going to freeze from those that
were already frozen from before.  That way the routine's xmin handling
code is symmetrical with the existing xmax handling code.  This is
preparation for an upcoming commit that will add page level freezing.

Also refactor the control flow within FreezeMultiXactId(), while adding
stricter sanity checks.  We now test OldestXmin directly, instead of
using FreezeLimit as an inexact proxy for OldestXmin.  This is further
preparation for the page level freezing work, which will make the
function's caller cede control of page level freezing to the function
where appropriate (where heap_prepare_freeze_tuple sees a tuple that
happens to contain a MultiXactId in its xmax).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
2022-12-22 09:37:59 -08:00
Peter Eisentraut 75f49221c2 Static assertions cleanup
Because we added StaticAssertStmt() first before StaticAssertDecl(),
some uses as well as the instructions in c.h are now a bit backwards
from the "native" way static assertions are meant to be used in C.
This updates the guidance and moves some static assertions to better
places.

Specifically, since the addition of StaticAssertDecl(), we can put
static assertions at the file level.  This moves a number of static
assertions out of function bodies, where they might have been stuck
out of necessity, to perhaps better places at the file level or in
header files.

Also, when the static assertion appears in a position where a
declaration is allowed, then using StaticAssertDecl() is more native
than StaticAssertStmt().

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
2022-12-15 10:10:32 +01:00
Peter Geoghegan 02d647bbf0 Don't test HEAP_XMAX_INVALID when freezing xmax.
We shouldn't ever need to rely on whether HEAP_XMAX_INVALID is set in
t_infomask when considering whether or not an xmax should be deemed
already frozen, since that status flag is just a hint.  The only
acceptable representation for an "xmax_already_frozen" raw xmax field is
the transaction ID value zero (also known as InvalidTransactionId).

Adjust code that superficially appeared to rely on HEAP_XMAX_INVALID to
make the rule about xmax_already_frozen clear.  Also avoid needlessly
rereading the tuple's raw xmax.

Oversight in bugfix commit d2599ecf.  There is no evidence that this
ever led to incorrect behavior, so no backpatch.  The worst consequence
of this bug was that VACUUM could hypothetically fail to notice and
report on certain kinds of corruption, which seems fairly benign.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzkh3DMCDRPfhZxj9xCq9v3WmzvmbiCpf1dNKUBPadhCbQ@mail.gmail.com
2022-11-23 10:49:39 -08:00
Peter Geoghegan 1489b1ce72 Standardize rmgrdesc recovery conflict XID output.
Standardize on the name snapshotConflictHorizon for all XID fields from
WAL records that generate recovery conflicts when in hot standby mode.
This supersedes the previous latestRemovedXid naming convention.

The new naming convention places emphasis on how the values are actually
used by REDO routines.  How the values are generated during original
execution (details of which vary by record type) is deemphasized.  Users
of tools like pg_waldump can now grep for snapshotConflictHorizon to see
all potential sources of recovery conflicts in a standardized way,
without necessarily having to consider which specific record types might
be involved.

Also bring a couple of WAL record types that didn't follow any kind of
naming convention into line.  These are heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type.  Now every WAL record whose REDO
routine calls ResolveRecoveryConflictWithSnapshot() passes through the
snapshotConflictHorizon field from its WAL record.  This is follow-up
work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL
records use a standard snapshotConflictHorizon style XID cutoff.

No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL
records doesn't change.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
2022-11-17 14:55:08 -08:00
Peter Geoghegan 813492dacc Use correct type name in comments about freezing.
Oversight in commit 9e540599, which added freeze plan deduplication.
2022-11-17 09:34:12 -08:00
Peter Eisentraut 8e1db29cdb Variable renaming in preparation for refactoring
Rename page -> block and dp -> page where appropriate.  The old naming
mixed up block and page in confusing ways.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2022-11-16 16:40:34 +01:00
Peter Eisentraut d1cb4e9f92 Remove useless casts
Maybe these are left from when PageGetItem() was a macro, but now they
are clearly useless.
2022-11-16 16:01:06 +01:00
Peter Eisentraut 4eb3b11200 Turn HeapKeyTest macro into inline function
It is easier to read as a function.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2022-11-16 13:26:48 +01:00
Peter Geoghegan 9e5405993c Deduplicate freeze plans in freeze WAL records.
Make heapam WAL records that describe freezing performed by VACUUM more
space efficient by storing each distinct "freeze plan" once, alongside
an array of associated page offset numbers (one per freeze plan).  The
freeze plans required for most heap pages tend to naturally have a great
deal of redundancy, so this technique is very effective in practice.  It
often leads to freeze WAL records that are less than 20% of the size of
equivalent WAL records generated using the previous approach.

The freeze plan concept was introduced by commit 3b97e6823b, which fixed
bugs in VACUUM's handling of MultiXacts.  We retain the concept of
freeze plans, but go back to using page offset number arrays.  There is
no loss of generality here because deduplication is an additive process
that gets applied mechanically when FREEZE_PAGE WAL records are built.

More than anything else, freeze plan deduplication is an optimization
that reduces the marginal cost of freezing additional tuples on pages
that will need to have at least one or two tuples frozen in any case.
Ongoing work that adds page-level freezing to VACUUM will take full
advantage of the improved cost profile through batching.

Also refactor some of the details surrounding recovery conflicts needed
to REDO freeze records in passing: make original execution responsible
for generating a standard latestRemovedXid cutoff, rather than working
backwards to get the same cutoff in the REDO routine.  Bugfix commit
66fbcb0d2e did it the other way around, which is equivalent but obscures
what's going on.

Also rename the cutoff field from the WAL record/struct (rename the
field cutoff_xid to latestRemovedXid to match similar WAL records).
Processing of conflicts by REDO routines is already completely uniform,
so tools like pg_waldump should present the information driving the
process uniformly.  There are two remaining WAL record types that still
don't quite follow this convention (heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type).  They can be brought into line
by later work that totally standardizes how the cutoffs are presented.

Bump XLOG_PAGE_MAGIC.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
2022-11-15 07:48:41 -08:00
Jeff Davis 97c61f70d1 Document WAL rules related to PD_ALL_VISIBLE in README.
Also improve comments.

Discussion: https://postgr.es/m/a50005c1c537f89bb359057fd70e66bb83bce969.camel@j-davis.com
Reviewed-by: Peter Geoghegan
2022-11-12 08:37:50 -08:00
Jeff Davis d6a3dbe14f Fix theoretical torn page hazard.
The original report was concerned with a possible inconsistency
between the heap and the visibility map, which I was unable to
confirm. The concern has been retracted.

However, there did seem to be a torn page hazard when using
checksums. By not setting the heap page LSN during redo, the
protections of minRecoveryPoint were bypassed. Fixed, along with a
misleading comment.

It may have been impossible to hit this problem in practice, because
it would require a page tear between the checksum and the flags, so I
am marking this as a theoretical risk. But, as discussed, it did
violate expectations about the page LSN, so it may have other
consequences.

Backpatch to all supported versions.

Reported-by: Konstantin Knizhnik
Reviewed-by: Konstantin Knizhnik
Discussion: https://postgr.es/m/fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru
Backpatch-through: 11
2022-11-11 12:38:29 -08:00
Jeff Davis 3eb8eeccbe Remove obsolete comments and code from prior to f8f4227976.
XLogReadBufferForRedo() and XLogReadBufferForRedoExtended() only return
BLK_NEEDS_REDO if the record LSN is greater than the page LSN, so
the redo routine doesn't need to do the LSN check again.

Discussion: https://postgr.es/m/0c37b80e62b1f3007d5a6d1292bd8fa0c275627a.camel@j-davis.com
2022-11-11 08:49:30 -08:00
Peter Eisentraut b1099eca8f Remove AssertArg and AssertState
These don't offer anything over plain Assert, and their usage had
already been declared obsolescent.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13
2022-10-28 09:19:06 +02:00
David Rowley 2d0bbedda7 Rename shadowed local variables
In a similar effort to f01592f91, here we mostly rename shadowed local
variables to remove the warnings produced when compiling with
-Wshadow=compatible-local.

This fixes 63 warnings and leaves just 5.

Author: Justin Pryzby, David Rowley
Reviewed-by: Justin Pryzby
Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-10-05 21:01:41 +13:00
Jeff Davis 163b0993a1 Fix race condition where heap_delete() fails to pin VM page.
Similar to 5f12bc94dc, the code must re-check PageIsAllVisible() after
buffer lock is re-acquired. Backpatching to the same version, 12.

Discussion: https://postgr.es/m/CAEP4nAw9jYQDKd_5Y+-s2E4YiUJq1vqiikFjYGpLShtp-K3gag@mail.gmail.com
Reported-by: Robins Tharakan
Reviewed-by: Robins Tharakan
Backpatch-through: 12
2022-09-22 11:04:00 -07:00
Peter Geoghegan 4bac9600f0 Harmonize heapam and tableam parameter names.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions.  Having parameter names
that are reliably consistent in this way will make it easier to reason
about groups of related C functions from the same translation unit as a
module.  It will also make certain refactoring tasks easier.

Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.  Later commits will do the
same for other parts of the codebase.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
2022-09-19 16:46:23 -07:00
Peter Geoghegan 9887dd38f9 Adjust comments that called MultiXactIds "XMIDs".
Oversights in commits 0b018fab and f3c15cbe.
2022-08-29 19:42:30 -07:00
Robert Haas b0a55e4329 Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.

Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".

Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.

On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.

Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.

Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.

Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 11:39:09 -04:00
Tomas Vondra e3fcca0d0d Revert changes in HOT handling of BRIN indexes
This reverts commits 5753d4ee32 and fe60b67250 that modified HOT to
ignore BRIN indexes. The commit message for 5753d4ee32 claims that:

    When determining whether an index update may be skipped by using
    HOT, we can ignore attributes indexed only by BRIN indexes. There
    are no index pointers to individual tuples in BRIN, and the page
    range summary will be updated anyway as it relies on visibility
    info.

This is partially incorrect - it's true BRIN indexes don't point to
individual tuples, so HOT chains are not an issue, but the visibitlity
info is not sufficient to keep the index up to date. This can easily
result in corrupted indexes, as demonstrated in the hackers thread.

This does not mean relaxing the HOT restrictions for BRIN is a lost
cause, but it needs to handle the two aspects (allowing HOT chains and
updating the page range summaries) as separate. But that requires a
major changes, and it's too late for that in the current dev cycle.

Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
2022-06-16 15:02:49 +02:00
Tom Lane 7b7ed046cb Prevent access to no-longer-pinned buffer in heapam_tuple_lock().
heap_fetch() used to have a "keep_buf" parameter that told it to return
ownership of the buffer pin to the caller after finding that the
requested tuple TID exists but is invisible to the specified snapshot.
This was thoughtlessly removed in commit 5db6df0c0, which broke
heapam_tuple_lock() (formerly EvalPlanQualFetch) because that function
needs to do more accesses to the tuple even if it's invisible.  The net
effect is that we would continue to touch the page for a microsecond or
two after releasing pin on the buffer.  Usually no harm would result;
but if a different session decided to defragment the page concurrently,
we could see garbage data and mistakenly conclude that there's no newer
tuple version to chain up to.  (It's hard to say whether this has
happened in the field.  The bug was actually found thanks to a later
change that allowed valgrind to detect accesses to non-pinned buffers.)

The most reasonable way to fix this is to reintroduce keep_buf,
although I made it behave slightly differently: buffer ownership
is passed back only if there is a valid tuple at the requested TID.
In HEAD, we can just add the parameter back to heap_fetch().
To avoid an API break in the back branches, introduce an additional
function heap_fetch_extended() in those branches.

In HEAD there is an additional, less obvious API change: tuple->t_data
will be set to NULL in all cases where buffer ownership is not returned,
in particular when the tuple exists but fails the time qual (and
!keep_buf).  This is to defend against any other callers attempting to
access non-pinned buffers.  We concluded that making that change in back
branches would be more likely to introduce problems than cure any.

In passing, remove a comment about heap_fetch that was obsoleted by
9a8ee1dc6.

Per bug #17462 from Daniil Anisimov.  Back-patch to v12 where the bug
was introduced.

Discussion: https://postgr.es/m/17462-9c98a0f00df9bd36@postgresql.org
2022-04-13 13:35:07 -04:00
Alvaro Herrera 24d2b2680a
Remove extraneous blank lines before block-closing braces
These are useless and distracting.  We wouldn't have written the code
with them to begin with, so there's no reason to keep them.

Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
2022-04-13 19:16:02 +02:00
Tom Lane bd037dc928 Make XLogRecGetBlockTag() throw error if there's no such block.
All but a few existing callers assume without checking that this
function succeeds.  While it probably will, that's a poor excuse for
not checking.  Let's make it return void and instead throw an error
if it doesn't find the block reference.  Callers that actually need
to handle the no-such-block case must now use the underlying function
XLogRecGetBlockTagExtended.

In addition to being a bit less error-prone, this should also serve
to suppress some Coverity complaints about XLogRecGetBlockRefInfo.

While at it, clean up some inconsistency about use of the
XLogRecHasBlockRef macro: make XLogRecGetBlockTagExtended use
that instead of open-coding the same condition, and avoid calling
XLogRecHasBlockRef twice in relevant code paths.  (That is,
calling XLogRecHasBlockRef followed by XLogRecGetBlockTag is now
deprecated: use XLogRecGetBlockTagExtended instead.)

Patch HEAD only; this doesn't seem to have enough value to consider
a back-branch API break.

Discussion: https://postgr.es/m/425039.1649701221@sss.pgh.pa.us
2022-04-11 17:43:53 -04:00
David Rowley b0e5f02ddc Fix various typos and spelling mistakes in code comments
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
2022-04-11 20:49:41 +12:00