Commit Graph

4646 Commits

Author SHA1 Message Date
Peter Geoghegan 5212d447fa Update obsolete multixact.c comments.
Commit 4f627f89 switched SLRU truncation for multixacts back to being a
task performed during VACUUM, but missed some comments that continued to
reference truncation happening as part of checkpointing.  Update those
comments now.

Also update comments that became obsolete when commit c3ffa731 changed
the way that vacuum_multixact_freeze_min_age is applied by VACUUM as it
computes its MultiXactCutoff cutoff (which is used by VACUUM to decide
what to freeze).  Explain the same issues by referencing how OldestMxact
is the latest valid value that relminmxid can ever be advanced to at the
end of a VACUUM (following the work in commit 0b018fab).
2023-01-03 16:54:35 -08:00
Peter Geoghegan 54afdcd618 vacuumlazy.c: Save get_database_name() in vacrel.
This brings dbname strings in line with namespace and relation name
strings.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkQ1TKU-DdNvnGeL870di3+CU1UTo-7nw7xFDpVE-XGjA@mail.gmail.com
2023-01-03 11:48:47 -08:00
Peter Geoghegan 79d4bf4eff Delay commit status checks until freezing executes.
pg_xact lookups are relatively expensive.  Move the xmin/xmax commit
status checks from the point that freeze plans are prepared to the point
that they're actually executed.  Otherwise we'll repeat many commit
status checks whenever multiple successive VACUUM operations scan the
same pages and decide against freezing each time, which is a waste of
cycles.

Oversight in commit 1de58df4, which added page-level freezing.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com
2023-01-03 11:22:36 -08:00
Peter Geoghegan b37a083239 Refine the definition of page-level freezing.
Improve comments added by commit 1de58df4 which describe the
lazy_scan_prune "freeze the page" path.  These newly revised comments
are based on suggestions from Jeff Davis.

In passing, remove nearby visibility_cutoff_xid comments left over from
commit 6daeeb1f.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/ebc857107fe3edd422ef8a65191ca4a8da568b9b.camel@j-davis.com
2023-01-03 10:08:55 -08:00
Michael Paquier 33ab0a2a52 Fix typos in comments, code and documentation
While on it, newlines are removed from the end of two elog() strings.
The others are simple grammar mistakes.  One comment in pg_upgrade
referred incorrectly to sequences since a7e5457.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com
Backpatch-through: 11
2023-01-03 16:26:14 +09:00
Bruce Momjian c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Peter Geoghegan 325bc54eed Adjust VACUUM hastup LP_REDIRECT comments.
The term "truncation" has been ambiguous since commit 10a8d13823 added
line pointer array truncation during heap pruning.  Clear things up by
specifying that we're talking about rel truncation here, to match nearby
comments that apply to tuples with storage.
2023-01-02 10:18:22 -08:00
Peter Geoghegan 6daeeb1f91 Avoid special XID snapshotConflictHorizon values.
Don't allow VACUUM to WAL-log the value FrozenTransactionId as the
snapshotConflictHorizon of freezing or visibility map related WAL
records.

The only special XID value that's an allowable snapshotConflictHorizon
is InvalidTransactionId, which is interpreted as "record definitely
doesn't require a recovery conflict".

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
2023-01-02 10:16:51 -08:00
Peter Eisentraut e351f85418 Push lpp variable closer to usage in heapgetpage()
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2023-01-02 09:39:31 +01:00
Tomas Vondra 02699bc1fd Fix assert in BRIN build_distances
When brin_minmax_multi_union merges summaries, we may end up with just a
single range after merge_overlapping_ranges. The summaries may contain
just one range each, and they may overlap (or be exactly the same).

With a single range there's no distance to calculate, but we happen to
call build_distances anyway - which is fine, we don't calculate the
distance in this case, except that with asserts this failed due to a
check there are at least two ranges.

The assert is unnecessarily strict, so relax it a bit and bail out if
there's just a single range. The relaxed assert would be enough, but
this way we don't allocate unnecessary memory for distance.

Backpatch to 14, where minmax-multi opclasses were introduced.

Reported-by: Jaime Casanova
Backpatch-through: 14
Discussion: https://postgr.es/m/YzVA55qS0hgz8P3r@ahch-to
2022-12-30 20:49:50 +01:00
Peter Eisentraut faf3750657 Add const to BufFileWrite
Make data buffer argument to BufFileWrite a const pointer and bubble
this up to various callers and related APIs.  This makes the APIs
clearer and more consistent.

Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
2022-12-30 10:12:24 +01:00
Peter Geoghegan 1de58df4fe Add page-level freezing to VACUUM.
Teach VACUUM to decide on whether or not to trigger freezing at the
level of whole heap pages.  Individual XIDs and MXIDs fields from tuple
headers now trigger freezing of whole pages, rather than independently
triggering freezing of each individual tuple header field.

Managing the cost of freezing over time now significantly influences
when and how VACUUM freezes.  The overall amount of WAL written is the
single most important freezing related cost, in general.  Freezing each
page's tuples together in batch allows VACUUM to take full advantage of
the freeze plan WAL deduplication optimization added by commit 9e540599.

Also teach VACUUM to trigger page-level freezing whenever it detects
that heap pruning generated an FPI.  We'll have already written a large
amount of WAL just to do that much, so it's very likely a good idea to
get freezing out of the way for the page early.  This only happens in
cases where it will directly lead to marking the page all-frozen in the
visibility map.

In most cases "freezing a page" removes all XIDs < OldestXmin, and all
MXIDs < OldestMxact.  It doesn't quite work that way in certain rare
cases involving MultiXacts, though.  It is convenient to define "freeze
the page" in a way that gives FreezeMultiXactId the leeway to put off
the work of processing an individual tuple's xmax whenever it happens to
be a MultiXactId that would require an expensive second pass to process
aggressively (allocating a new multi is especially worth avoiding here).
FreezeMultiXactId is eager when processing is cheap (as it usually is),
and lazy in the event of an individual multi that happens to require
expensive second pass processing.  This avoids regressions related to
processing of multis that page-level freezing might otherwise cause.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
2022-12-28 08:50:47 -08:00
Peter Geoghegan 63c844a0a5 Remove overzealous MultiXact freeze assertion.
When VACUUM determines that an existing MultiXact should use a freeze
plan that sets xmax to InvalidTransactionId, the original Multi may or
may not be before OldestMxact.  Remove an incorrect assertion that
expected it to always be from before OldestMxact.

Oversight in commit 4ce3af.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TYAPR01MB5866B24104FD80B5D7E65C3EF5ED9@TYAPR01MB5866.jpnprd01.prod.outlook.com
2022-12-26 23:36:02 -08:00
Michael Paquier 13e0d7a603 Rename pg_dissect_walfile_name() to pg_split_walfile_name()
The former name was discussed as being confusing, so use "split", as per
a suggestion from Magnus Hagander.

While on it, one of the output arguments is renamed from "segno" to
"segment_number", as per a suggestion from Kyotaro Horiguchi.

The documentation is updated to reflect all these changes.

Bump catalog version.

Author: Bharath Rupireddy, Michael Paquier
Discussion: https://postgr.es/m/CABUevEytQVaOOhGdoh0D7hGwe3fuKcRF6NthsSW7ww04EmtFgQ@mail.gmail.com
2022-12-23 09:15:01 +09:00
Peter Geoghegan 07eef53955 Use scanned_pages to decide when to failsafe check.
Perform a failsafe check every time VACUUM's first heap scan scans a
further FAILSAFE_EVERY_PAGES pages, rather than using an approach based
on the number of physical blocks that our current blkno is from the
blkno at the time of the previous failsafe check.  That way VACUUM will
perform a failsafe check every time it has scanned a uniform number of
pages, without it mattering when or how VACUUM skipped pages using the
visibility map.

Sami Imseih, with changes to FAILSAFE_EVERY_PAGES comments added by me.

Author: Sami Imseih <simseih@amazon.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/401CE010-4049-4B94-9961-0B610A5D254D%40amazon.com
2022-12-22 10:41:40 -08:00
Peter Geoghegan 4ce3afb82e Refactor how VACUUM passes around its XID cutoffs.
Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as
FreezeLimit and OldestXmin.  This state is initialized in vacuum.c, and
then passed around by code from vacuumlazy.c to heapam.c freezing
related routines.  The new convention is that everybody works off of the
same cutoff state, which is passed around via pointers to const.

Also simplify some of the logic for dealing with frozen xmin in
heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to
clearly distinguish xmin XIDs that we're going to freeze from those that
were already frozen from before.  That way the routine's xmin handling
code is symmetrical with the existing xmax handling code.  This is
preparation for an upcoming commit that will add page level freezing.

Also refactor the control flow within FreezeMultiXactId(), while adding
stricter sanity checks.  We now test OldestXmin directly, instead of
using FreezeLimit as an inexact proxy for OldestXmin.  This is further
preparation for the page level freezing work, which will make the
function's caller cede control of page level freezing to the function
where appropriate (where heap_prepare_freeze_tuple sees a tuple that
happens to contain a MultiXactId in its xmax).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
2022-12-22 09:37:59 -08:00
Michael Paquier 22e3b55805 Switch some system functions to use get_call_result_type()
This shaves some code by replacing the combinations of
CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of
the attributes listed in pg_proc.dat by get_call_result_type() to build
the TupleDesc needed for the rows generated.

get_call_result_type() is more expensive than the former style, but this
removes some duplication with the lists of OUT parameters (pg_proc.dat
and the attributes hardcoded in these code paths).  This is applied to
functions that are not considered as critical (aka that could be called
repeatedly for monitoring purposes).

Author: Bharath Rupireddy
Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
2022-12-21 10:11:22 +09:00
Andrew Dunstan 8284cf5f74 Add copyright notices to meson files
Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
2022-12-20 07:54:39 -05:00
Michael Paquier cca1863489 Add pg_dissect_walfile_name()
This function takes in input a WAL segment name and returns a tuple made
of the segment sequence number (dependent on the WAL segment size of the
cluster) and its timeline, as of a thin SQL wrapper around the existing
XLogFromFileName().

This function has multiple usages, like being able to compile a LSN from
a file name and an offset, or finding the timeline of a segment without
having to do to some maths based on the first eight characters of the
segment.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Maxim Orlov, Michael
Paquier
Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
2022-12-20 13:36:27 +09:00
Peter Eisentraut 75f49221c2 Static assertions cleanup
Because we added StaticAssertStmt() first before StaticAssertDecl(),
some uses as well as the instructions in c.h are now a bit backwards
from the "native" way static assertions are meant to be used in C.
This updates the guidance and moves some static assertions to better
places.

Specifically, since the addition of StaticAssertDecl(), we can put
static assertions at the file level.  This moves a number of static
assertions out of function bodies, where they might have been stuck
out of necessity, to perhaps better places at the file level or in
header files.

Also, when the static assertion appears in a position where a
declaration is allowed, then using StaticAssertDecl() is more native
than StaticAssertStmt().

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
2022-12-15 10:10:32 +01:00
Tom Lane 20432f8731 Rethink handling of [Prevent|Is]InTransactionBlock in pipeline mode.
Commits f92944137 et al. made IsInTransactionBlock() set the
XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false",
on the grounds that that kept its API promises equivalent to those of
PreventInTransactionBlock().  This turns out to be a bad idea though,
because it allows an ANALYZE in a pipelined series of commands to
cause an immediate commit, which is unexpected.

Furthermore, if we return "false" then we have another issue,
which is that ANALYZE will decide it's allowed to do internal
commit-and-start-transaction sequences, thus possibly unexpectedly
committing the effects of previous commands in the pipeline.

To fix the latter situation, invent another transaction state flag
XACT_FLAGS_PIPELINING, which explicitly records the fact that we
have executed some extended-protocol command and not yet seen a
commit for it.  Then, require that flag to not be set before allowing
InTransactionBlock() to return "false".

Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT
without fear of causing problems.  This means that the API guarantees
of IsInTransactionBlock now diverge from PreventInTransactionBlock,
which is mildly annoying, but it seems OK given the very limited usage
of IsInTransactionBlock.  (In any case, a caller preferring the old
behavior could always set NEEDIMMEDIATECOMMIT for itself.)

For consistency also require XACT_FLAGS_PIPELINING to not be set
in PreventInTransactionBlock.  This too is meant to prevent commands
such as CREATE DATABASE from silently committing previous commands
in a pipeline.

Per report from Peter Eisentraut.  As before, back-patch to all
supported branches (which sadly no longer includes v10).

Discussion: https://postgr.es/m/65a899dd-aebc-f667-1d0a-abb89ff3abf8@enterprisedb.com
2022-12-13 14:23:58 -05:00
Tom Lane 2661469d86 Allow DateTimeParseError to handle bad-timezone error messages.
Pay down some ancient technical debt (dating to commit 022fd9966):
fix a couple of places in datetime parsing that were throwing
ereport's immediately instead of returning a DTERR code that could be
interpreted by DateTimeParseError.  The reason for that was that there
was no mechanism for passing any auxiliary data (such as a zone name)
to DateTimeParseError, and these errors seemed to really need it.
Up to now it didn't matter that much just where the error got thrown,
but now we'd like to have a hard policy that datetime parse errors
get thrown from just the one place.

Hence, invent a "DateTimeErrorExtra" struct that can be used to
carry any extra values needed for specific DTERR codes.  Perhaps
in the future somebody will be motivated to use this to improve
the specificity of other DateTimeParseError messages, but for now
just deal with the timezone-error cases.

This is on the way to making the datetime input functions report
parse errors softly; but it's really an independent change, so
commit separately.

Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
2022-12-09 13:30:47 -05:00
Michael Paquier 83a1a1b566 Generate pg_stat_get*() functions for tables using macros
The same code pattern is repeated 17 times for int64 counters (0 for
missing entry) and 5 times for timestamps (NULL for missing entry) on
table entries.  This code is switched to use a macro for the basic code
instead, shaving a few hundred lines of originally-duplicated code.  The
function names remain the same, but some fields of PgStat_StatTabEntry
have to be renamed to cope with the new style.

Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart
Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13
2022-12-06 10:46:35 +09:00
Alexander Korotkov 941aa6a626 Check the snapshot argument of index_beginscan and family
Passing a NULL snapshot (InvalidSnapshot) is going to work but only as long
as the index can't find any matching rows.  This can be confusing for
the extension authors, so add an explicit check for this argument.  The check
is implemented with Assert() in order to avoid overhead in release builds.

Reported-by: Sven Klemm
Discussion: https://postgr.es/m/CAJ7c6TPxitD4vbKyP-mpmC1XwyHdPPqvjLzm%2BVpB88h8LGgneQ%40mail.gmail.com
Author: Aleksander Alekseev
Reviewed-by: Pavel Borisov
2022-12-06 03:29:18 +03:00
Michael Paquier 71cb84ec69 Add LSN location in some error messages related to WAL pages
The error messages reported during any failures while reading or
validating the header of a WAL currently includes only the offset of the
page but not the compiled LSN referring to the page, requiring an extra
step to compile it if looking at the surroundings with pg_waldump or
similar.  Adding this information costs a bit in translation, but also
eases debugging.

Author: Bharath Rupireddy
Reviewed-by:  Álvaro Herrera, Kyotaro Horiguchi, Maxim Orlov, Michael
Paquier
Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
2022-12-05 09:28:29 +09:00
Alvaro Herrera fb958b5da8
Generalize ri_RootToPartitionMap to use for non-partition children
ri_RootToPartitionMap is currently only initialized for tuple routing
target partitions, though a future commit will need the ability to use
it even for the non-partition child tables, so make adjustments to the
decouple it from the partitioning code.

Also, make it lazily initialized via ExecGetRootToChildMap(), making
that function its preferred access path.  Existing third-party code
accessing it directly should no longer do so; consequently, it's been
renamed to ri_RootToChildMap, which also makes it consistent with
ri_ChildToRootMap.

ExecGetRootToChildMap() houses the logic of setting the map appropriately
depending on whether a given child relation is partition or not.

To support this, also add a separate entry point for TupleConversionMap
creation that receives an AttrMap.  No new code here, just split an
existing function in two.

Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
2022-12-02 10:35:55 +01:00
Jeff Davis edf12e7bbd Fix memory leak for hashing with nondeterministic collations.
Backpatch through 12, where nondeterministic collations were
introduced (5e1963fb76).

Backpatch-through: 12
2022-12-01 11:49:15 -08:00
Tom Lane 8242752f9c Improve heuristics for compressing the KnownAssignedXids array.
Previously, we'd compress only when the active range of array entries
reached Max(4 * PROCARRAY_MAXPROCS, 2 * pArray->numKnownAssignedXids).
If max_connections is large, the first term could result in not
compressing for a long time, resulting in much wastage of cycles in
hot-standby backends scanning the array to take snapshots.  Get rid
of that term, and just bound it to 2 * pArray->numKnownAssignedXids.

That however creates the opposite risk, that we might spend too much
effort compressing.  Hence, consider compressing only once every 128
commit records.  (This frequency was chosen by benchmarking.  While
we only tried one benchmark scenario, the results seem stable over
a fairly wide range of frequencies.)

Also, force compression when processing RecoveryInfo WAL records
(which should be infrequent); the old code could perform compression
then, but would do so only after the same array-range check as for
the transaction-commit path.

Also, opportunistically run compression if the startup process is about
to wait for WAL, though not oftener than once a second.  This should
prevent cases where we waste lots of time by leaving the array
not-compressed for long intervals due to low WAL traffic.

Lastly, add a simple check to keep us from uselessly compressing
when the array storage is already compact.

Back-patch, as the performance problem is worse in pre-v14 branches
than in HEAD.

Simon Riggs and Michail Nikolaev, with help from Tom Lane and
Andres Freund.

Discussion: https://postgr.es/m/CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
2022-11-29 15:43:17 -05:00
Alvaro Herrera ad86d159b6
Add 'missing_ok' argument to build_attrmap_by_name
When it's given as true, return a 0 in the position of the missing
column rather than raising an error.

This is currently unused, but it allows us to reimplement column
permission checking in a subsequent commit.  It seems worth breaking
into a separate commit because it affects unrelated code.

Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFfiai=qBxPDTjaio_ZcaqUKh+FC=prESrB8ogZgFNNNQ@mail.gmail.com
2022-11-29 09:39:36 +01:00
Thomas Munro cd4329d939 Remove promote_trigger_file.
Previously, an idle startup (recovery) process would wake up every 5
seconds to have a chance to poll for promote_trigger_file, even if that
GUC was not configured.  That promotion triggering mechanism was
effectively superseded by pg_ctl promote and pg_promote() a long time
ago.  There probably aren't many users left and it's very easy to change
to the modern mechanisms, so we agreed to remove the feature.

This is part of a campaign to reduce wakeups on idle systems.

Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com>
Discussion: https://postgr.es/m/CANbhV-FsjnzVOQGBpQ589%3DnWuL1Ex0Ykn74Nh1hEjp2usZSR5g%40mail.gmail.com
2022-11-29 12:08:38 +13:00
David Rowley ec5affdbc2 Improve indenting in _hash_pgaddtup
The Assert added in d09dbeb9b came out rather ugly after having run
pgindent on that code.  Here we adjust things to use some local variables
so that the Assert remains within the 80-character margin.

Author: Ted Yu
Discussion: https://postgr.es/m/CALte62wLSir1=x93Jf0xZvHaO009FEJfhVMFwnaR8q=csPP8kQ@mail.gmail.com
2022-11-25 10:10:44 +13:00
Alvaro Herrera 2cf41cd309
Make multixact error message more explicit
There are recent reports involving a very old error message that we have
no history of hitting -- perhaps a recently introduced bug.  Improve the
error message in an attempt to improve our chances of investigating the
bug.

Per reports from Dimos Stamatakis and Bob Krier.

Backpatch to 11.

Discussion: https://postgr.es/m/CO2PR0801MB2310579F65529380A4E5EDC0E20A9@CO2PR0801MB2310.namprd08.prod.outlook.com
Discussion: https://postgr.es/m/17518-04e368df5ad7f2ee@postgresql.org
2022-11-24 10:45:10 +01:00
David Rowley d09dbeb9bd Speedup hash index builds by skipping needless binary searches
When building hash indexes using the spool method, tuples are added to the
index page in hashkey order.  Because of this, we can safely skip
performing the binary search on the existing tuples on the page to find
the location to insert the tuple based on its hashkey value.  For this
case, we can just always put the tuple at the end of the item array as the
tuples will always arrive in hashkey order.

Testing has shown that this can improve hash index build speeds by 5-15%
with a unique set of integer values.

Author: Simon Riggs
Reviewed-by: David Rowley
Tested-by: David Zhang, Tomas Vondra
Discussion: https://postgr.es/m/CANbhV-GBc5JoG0AneUGPZZW3o4OK5LjBGeKe_icpC3R1McrZWQ@mail.gmail.com
2022-11-24 17:21:44 +13:00
Peter Geoghegan b6074846ce Simplify vacuum_set_xid_limits() signature.
Pass VACUUM parameters (VacuumParams state) to vacuum_set_xid_limits()
directly, rather than passing most individual VacuumParams fields as
separate arguments.

Also make vacuum_set_xid_limits() output parameter symbol names match
those used by its vacuumlazy.c caller.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=TE7gW5DgSahDkf0UEZigFGAoHNNN6EvSrdzC=Kn+hrA@mail.gmail.com
2022-11-23 11:10:06 -08:00
Peter Geoghegan 02d647bbf0 Don't test HEAP_XMAX_INVALID when freezing xmax.
We shouldn't ever need to rely on whether HEAP_XMAX_INVALID is set in
t_infomask when considering whether or not an xmax should be deemed
already frozen, since that status flag is just a hint.  The only
acceptable representation for an "xmax_already_frozen" raw xmax field is
the transaction ID value zero (also known as InvalidTransactionId).

Adjust code that superficially appeared to rely on HEAP_XMAX_INVALID to
make the rule about xmax_already_frozen clear.  Also avoid needlessly
rereading the tuple's raw xmax.

Oversight in bugfix commit d2599ecf.  There is no evidence that this
ever led to incorrect behavior, so no backpatch.  The worst consequence
of this bug was that VACUUM could hypothetically fail to notice and
report on certain kinds of corruption, which seems fairly benign.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzkh3DMCDRPfhZxj9xCq9v3WmzvmbiCpf1dNKUBPadhCbQ@mail.gmail.com
2022-11-23 10:49:39 -08:00
Andres Freund a4adc31f69 lwlock: Fix quadratic behavior with very long wait lists
Until now LWLockDequeueSelf() sequentially searched the list of waiters to see
if the current proc is still is on the list of waiters, or has already been
removed. In extreme workloads, where the wait lists are very long, this leads
to a quadratic behavior. #backends iterating over a list #backends
long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in
the first place also increases with the increased length of the wait queue, as
it becomes more likely that a lock is released while waiting for the wait list
lock, which is held for longer during lock release.

Due to the exponential back-off in perform_spin_delay() this is surprisingly
hard to detect. We should make that easier, e.g. by adding a wait event around
the pg_usleep() - but that's a separate patch.

The fix is simple - track whether a proc is currently waiting in the wait list
or already removed but waiting to be woken up in PGPROC->lwWaiting.

In some workloads with a lot of clients contending for a small number of
lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput.

As the quadratic behavior arguably is a bug, we might want to decide to
backpatch this fix in the future.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de
Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
2022-11-20 11:56:32 -08:00
Peter Geoghegan 1489b1ce72 Standardize rmgrdesc recovery conflict XID output.
Standardize on the name snapshotConflictHorizon for all XID fields from
WAL records that generate recovery conflicts when in hot standby mode.
This supersedes the previous latestRemovedXid naming convention.

The new naming convention places emphasis on how the values are actually
used by REDO routines.  How the values are generated during original
execution (details of which vary by record type) is deemphasized.  Users
of tools like pg_waldump can now grep for snapshotConflictHorizon to see
all potential sources of recovery conflicts in a standardized way,
without necessarily having to consider which specific record types might
be involved.

Also bring a couple of WAL record types that didn't follow any kind of
naming convention into line.  These are heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type.  Now every WAL record whose REDO
routine calls ResolveRecoveryConflictWithSnapshot() passes through the
snapshotConflictHorizon field from its WAL record.  This is follow-up
work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL
records use a standard snapshotConflictHorizon style XID cutoff.

No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL
records doesn't change.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
2022-11-17 14:55:08 -08:00
Peter Geoghegan 813492dacc Use correct type name in comments about freezing.
Oversight in commit 9e540599, which added freeze plan deduplication.
2022-11-17 09:34:12 -08:00
Peter Eisentraut 8e1db29cdb Variable renaming in preparation for refactoring
Rename page -> block and dp -> page where appropriate.  The old naming
mixed up block and page in confusing ways.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2022-11-16 16:40:34 +01:00
Peter Eisentraut d1cb4e9f92 Remove useless casts
Maybe these are left from when PageGetItem() was a macro, but now they
are clearly useless.
2022-11-16 16:01:06 +01:00
Peter Eisentraut 4eb3b11200 Turn HeapKeyTest macro into inline function
It is easier to read as a function.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
2022-11-16 13:26:48 +01:00
Jeff Davis 1eda3ce802 Mark argument of RegisterCustomRmgr() as const. 2022-11-15 16:01:35 -08:00
Peter Geoghegan 9e5405993c Deduplicate freeze plans in freeze WAL records.
Make heapam WAL records that describe freezing performed by VACUUM more
space efficient by storing each distinct "freeze plan" once, alongside
an array of associated page offset numbers (one per freeze plan).  The
freeze plans required for most heap pages tend to naturally have a great
deal of redundancy, so this technique is very effective in practice.  It
often leads to freeze WAL records that are less than 20% of the size of
equivalent WAL records generated using the previous approach.

The freeze plan concept was introduced by commit 3b97e6823b, which fixed
bugs in VACUUM's handling of MultiXacts.  We retain the concept of
freeze plans, but go back to using page offset number arrays.  There is
no loss of generality here because deduplication is an additive process
that gets applied mechanically when FREEZE_PAGE WAL records are built.

More than anything else, freeze plan deduplication is an optimization
that reduces the marginal cost of freezing additional tuples on pages
that will need to have at least one or two tuples frozen in any case.
Ongoing work that adds page-level freezing to VACUUM will take full
advantage of the improved cost profile through batching.

Also refactor some of the details surrounding recovery conflicts needed
to REDO freeze records in passing: make original execution responsible
for generating a standard latestRemovedXid cutoff, rather than working
backwards to get the same cutoff in the REDO routine.  Bugfix commit
66fbcb0d2e did it the other way around, which is equivalent but obscures
what's going on.

Also rename the cutoff field from the WAL record/struct (rename the
field cutoff_xid to latestRemovedXid to match similar WAL records).
Processing of conflicts by REDO routines is already completely uniform,
so tools like pg_waldump should present the information driving the
process uniformly.  There are two remaining WAL record types that still
don't quite follow this convention (heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type).  They can be brought into line
by later work that totally standardizes how the cutoffs are presented.

Bump XLOG_PAGE_MAGIC.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
2022-11-15 07:48:41 -08:00
Amit Kapila e848be60b5 Fix cleanup lock acquisition in SPLIT_ALLOCATE_PAGE replay.
During XLOG_HASH_SPLIT_ALLOCATE_PAGE replay, we were checking for a
cleanup lock on the new bucket page after acquiring an exclusive lock on
it and raising a PANIC error on failure. However, it is quite possible
that checkpointer can acquire the pin on the same page before acquiring a
lock on it, and then the replay will lead to an error. So instead, directly
acquire the cleanup lock on the new bucket page during
XLOG_HASH_SPLIT_ALLOCATE_PAGE replay operation.

Reported-by: Andres Freund
Author: Robert Haas
Reviewed-By: Amit Kapila, Andres Freund, Vignesh C
Backpatch-through: 11
Discussion: https://postgr.es/m/20220810022617.fvjkjiauaykwrbse@awork3.anarazel.de
2022-11-14 10:43:33 +05:30
Peter Eisentraut c727f511bd Refactor aclcheck functions
Instead of dozens of mostly-duplicate pg_foo_aclcheck() functions,
write one common function object_aclcheck() that can handle almost all
of them.  We already have all the information we need, such as which
system catalog corresponds to which catalog table and which column is
the ACL column.

There are a few pg_foo_aclcheck() that don't work via the generic
function and have special APIs, so those stay as is.

I also changed most pg_foo_aclmask() functions to static functions,
since they are not used outside of aclchk.c.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/95c30f96-4060-2f48-98b5-a4392d3b6066@enterprisedb.com
2022-11-13 09:02:41 +01:00
Peter Eisentraut afbfc02983 Refactor ownercheck functions
Instead of dozens of mostly-duplicate pg_foo_ownercheck() functions,
write one common function object_ownercheck() that can handle almost
all of them.  We already have all the information we need, such as
which system catalog corresponds to which catalog table and which
column is the owner column.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/95c30f96-4060-2f48-98b5-a4392d3b6066@enterprisedb.com
2022-11-13 08:12:37 +01:00
Jeff Davis 97c61f70d1 Document WAL rules related to PD_ALL_VISIBLE in README.
Also improve comments.

Discussion: https://postgr.es/m/a50005c1c537f89bb359057fd70e66bb83bce969.camel@j-davis.com
Reviewed-by: Peter Geoghegan
2022-11-12 08:37:50 -08:00
Jeff Davis d6a3dbe14f Fix theoretical torn page hazard.
The original report was concerned with a possible inconsistency
between the heap and the visibility map, which I was unable to
confirm. The concern has been retracted.

However, there did seem to be a torn page hazard when using
checksums. By not setting the heap page LSN during redo, the
protections of minRecoveryPoint were bypassed. Fixed, along with a
misleading comment.

It may have been impossible to hit this problem in practice, because
it would require a page tear between the checksum and the flags, so I
am marking this as a theoretical risk. But, as discussed, it did
violate expectations about the page LSN, so it may have other
consequences.

Backpatch to all supported versions.

Reported-by: Konstantin Knizhnik
Reviewed-by: Konstantin Knizhnik
Discussion: https://postgr.es/m/fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru
Backpatch-through: 11
2022-11-11 12:38:29 -08:00
Jeff Davis 3eb8eeccbe Remove obsolete comments and code from prior to f8f4227976.
XLogReadBufferForRedo() and XLogReadBufferForRedoExtended() only return
BLK_NEEDS_REDO if the record LSN is greater than the page LSN, so
the redo routine doesn't need to do the LSN check again.

Discussion: https://postgr.es/m/0c37b80e62b1f3007d5a6d1292bd8fa0c275627a.camel@j-davis.com
2022-11-11 08:49:30 -08:00
Michael Paquier 5ca3645cb3 Fix comment of SimpleLruInit() in slru.c
sync_handler was not mentioned in the comment block of the function.

Oversight in dee663f.

Author: Aleksander Alekseev
Discussion: https://postgr.es/m/CAJ7c6TPUd9BwNY47TtMxaijLHSbyHNdhu=kvbGnvO_bi+oC6_Q@mail.gmail.com
Backpatch-through: 14
2022-11-10 16:32:29 +09:00
Tom Lane 4f981df8e0 Report a more useful error for reloptions on a partitioned table.
Previously, trying to set storage parameters on a partitioned table
always led to "unrecognized parameter foo", because the code expected
there might be some valid parameters; but there aren't any.  The docs
make clear that it's intended that there never will be any, so let's
replace this useless search with a more to-the-point message.

Simon Riggs and Karina Litskevich

Discussion: https://postgr.es/m/CANbhV-H=eZ9kTR9mUgKGK0Qv9uXP=U+dQg3rinQHfTdFMhBA2A@mail.gmail.com
2022-11-09 12:28:34 -05:00
Tom Lane e613ace1f0 Doc: add comments about PreventInTransactionBlock/IsInTransactionBlock.
Add a little to the header comments for these functions to make it
clearer what guarantees about commit behavior are provided to callers.
(See commit f92944137 for context.)

Although this is only a comment change, it's really documentation
aimed at authors of extensions, so it seems appropriate to back-patch.

Yugo Nagata and Tom Lane, per further discussion of bug #17434.

Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
2022-11-09 11:08:52 -05:00
Michael Paquier 3bdbdf5d06 Introduce pg_pwrite_zeros() in fileutils.c
This routine is designed to write zeros to a file using vectored I/O,
for a size given by its caller, being useful when it comes to
initializing a file with a final size already known.

XLogFileInitInternal() in xlog.c is changed to use this new routine when
initializing WAL segments with zeros (wal_init_zero enabled).  Note that
the aligned buffers used for the vectored I/O writes have a size of
XLOG_BLCKSZ, and not BLCKSZ anymore, as pg_pwrite_zeros() relies on
PGAlignedBlock while xlog.c originally used PGAlignedXLogBlock.

This routine will be used in a follow-up patch to do the pre-padding of
WAL segments for pg_receivewal and pg_basebackup when these are not
compressed.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Andres Freund, Thomas Munro, Michael
Paquier
Discussion: https://www.postgresql.org/message-id/CALj2ACUq7nAb7%3DbJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA%40mail.gmail.com
2022-11-08 12:23:46 +09:00
Andres Freund c3652cd84a Remove redundant breaks in HeapTupleSatisfiesVisibility
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_ZJg_N7zHtWP+JoSY_hrce4+GKioL137Y2c2En-kuXQ7g@mail.gmail.com
2022-11-05 01:31:17 -07:00
Michael Paquier 2a71de8915 Remove unneeded includes of <sys/stat.h>
Since bfb9dfd, none of the files updated in this commit have any stat()
calls, so these inclusions are not necessary, for the same reasons as
233cf6e.

Per discussion with John Naylor.

Discussion: https://postgr.es/m/CAFBsxsGGGX7KD6RxbNoSJzuSc8Gz3hOxcfhTOMLB_hJcm68dKQ@mail.gmail.com
2022-11-05 12:31:28 +09:00
Amit Kapila 568546f7e4 Improve the description of XLOG_RUNNING_XACTS.
Previously, the description of XLOG_RUNNING_XACTS showed only
top-transaction XIDs and whether subtransactions overflowed. This commit
improves it to show individual subtransaction XIDs. This also improves the
description of overflowed subtransactions.

This additional information can be helpful for testing and debugging
purposes.

Author: Masahiko Sawada
Reviewd by: Fujii Masao, Kyotaro Horiguchi, Ashutosh Bapat, Bharath Rupireddy
Discussion: https://postgr.es/m/CAD21AoAqvaE+XEeXHHPdAGQPcCoGXxuoeutq_nWhUSQvTt5+tA@mail.gmail.com
2022-11-02 10:06:55 +05:30
David Rowley 7c335b7a20 Add doubly linked count list implementation
We have various requirements when using a dlist_head to keep track of the
number of items in the list.  This, traditionally, has been done by
maintaining a counter variable in the calling code.  Here we tidy this up
by adding "dclist", which is very similar to dlist but also keeps track of
the number of items stored in the list.

Callers may use the new dclist_count() function when they need to know how
many items are stored. Obtaining the count is an O(1) operation.

For simplicity reasons, dclist and dlist both use dlist_node as their node
type and dlist_iter/dlist_mutable_iter as their iterator type. dclists
have all of the same functionality as dlists except there is no function
named dclist_delete().  To remove an item from a list dclist_delete_from()
must be used.  This requires knowing which dclist the given item is stored
in.

Additionally, here we also convert some dlists where additional code
exists to keep track of the number of items stored and to make these use
dclists instead.

Author: David Rowley
Reviewed-by: Bharath Rupireddy, Aleksander Alekseev
Discussion: https://postgr.es/m/CAApHDvrtVxr+FXEX0VbViCFKDGxA3tWDgw9oFewNXCJMmwLjLg@mail.gmail.com
2022-11-02 14:06:05 +13:00
Michael Paquier d9d873bac6 Clean up some inconsistencies with GUC declarations
This is similar to 7d25958, and this commit takes care of all the
remaining inconsistencies between the initial value used in the C
variable associated to a GUC and its default value stored in the GUC
tables (as of pg_settings.boot_val).

Some of the initial values of the GUCs updated rely on a compile-time
default.  These are refactored so as the GUC table and its C declaration
use the same values.  This makes everything consistent with other
places, backend_flush_after, bgwriter_flush_after, port,
checkpoint_flush_after doing so already, for example.

Extracted from a larger patch by Peter Smith.  The spots updated in the
modules are from me.

Author: Peter Smith, Michael Paquier
Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby
Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com
2022-10-31 12:44:48 +09:00
Peter Eisentraut b1099eca8f Remove AssertArg and AssertState
These don't offer anything over plain Assert, and their usage had
already been declared obsolescent.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13
2022-10-28 09:19:06 +02:00
David Rowley d37aa3d358 Allow nodeSort to perform Datum sorts for byref types
Here we add a new 'copy' parameter to tuplesort_getdatum so that we can
instruct the function not to datumCopy() byref Datums before returning.

Similar to 91e9e89dc, this can provide significant performance
improvements in nodeSort when sorting by a single byref column and the
sort's targetlist contains only that column.

This allows us to re-enable Datum sorts for byref types which was disabled
in 3a5817695 due to a reported memory leak.

Additionally, here we slightly optimize DISTINCT aggregates so that we no
longer perform any datumCopy() when we find the current value not to be
distinct from the previous value.  Previously the code would always take a
copy of the most recent Datum and pfree the previous value, even when the
values were the same.  Testing shows a small but noticeable performance
increase when aggregate transitions are skipped due to the current
transition value being the same as the prior one.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqS6wC5U==k9Hd26E4EQXH3QR67-T4=Q1rQ36NGvjfVSg@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvqHonfe9G1cVaKeHbDx70R_zCrM3qP2AGXpGrieSKGnhA@mail.gmail.com
2022-10-28 09:25:12 +13:00
Heikki Linnakangas 0e972f50fd Update outdated comment for TransactionIdSetTreeStatus
Commit 06da3c570f changed the way subtransactions are marked as
SUBCOMMITTED, but the example it included actually documented the old
way. Update it.

Author: Japin Li
Discussion: https://www.postgresql.org/message-id/MEYP282MB16690BC96DFBE08CC857E1E3B6319%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2022-10-25 21:43:52 +02:00
Alvaro Herrera 8328a15f8f
Fix recently added incorrect assertion
Commit df3737a651 added an incorrect assertion about the preconditions
for invoking the backup cleanup callback: it misfires at session end in
case a backup completes successfully.  Fix it, using coding from Michaël
Paquier.  Also add some tests for the various cases.

Reported by Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20221021.161038.1277961198945653224.horikyota.ntt@gmail.com
2022-10-24 12:02:33 +02:00
Michael Paquier 6cc66197ff Improve memory handling across SQL-callable backup functions
Since pg_backup_start() and pg_backup_stop() exist, the tablespace map
data and the backup state data (backup_label string until 7d70809) have
been allocated in the TopMemoryContext.  This approach would cause
memory leaks in the session calling these functions if failures happen
before pg_backup_stop() ends, leaking more memory on repeated failures.
Both things need little memory so that would not be really noticeable
for most users, except perhaps connection poolers with long-lived
connections able to trigger backup failures with these functions.

This commit improves the logic in this area by not allocating anymore
the backup-related data that needs to travel across the SQL-callable
backup functions in TopMemoryContext, by using instead a dedicated
memory context child of TopMemoryContext.  The memory context is created
in pg_backup_start() and deleted when finishing pg_backup_stop().  In
the event of an in-flight failure, this memory context gets reset in the
follow-up pg_backup_start() call, so as we are sure that only one run
worth of data is leaked at any time.  Some cleanup was already done for
the backup data on a follow-up call of pg_backup_start(), but using a
memory context makes the whole simpler.

BASE_BACKUP commands are executed in isolation, relying on the memory
context created for replication commands, hence these do not need such
an extra logic.

Author: Bharath Rupireddy
Reviewed-by: Robert Haas, Alvaro Herrera, Cary Huang, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACXqvfKF2B0beQ=aJMdWnpNohmBPsRg=EDQj_6y1t2O8mQ@mail.gmail.com
2022-10-22 11:54:02 +09:00
Alvaro Herrera 342bb38bfe
Get rid of XLogCtlInsert->forcePageWrites
After commit 39969e2a1e, ->forcePageWrites is no longer very
interesting: we can just test whether runningBackups is different from 0.
This simplifies some code, so do away with it.

Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/39969e2a1e4d7f5a37f3ef37d53bbfe171e7d77a
2022-10-19 12:35:00 +02:00
Alvaro Herrera df3737a651
Remove pg_backup_start_callback and reuse similar code
We had two copies of almost identical logic to revert shared memory
state when a running backup aborts; we can remove
pg_backup_start_callback if we adapt do_pg_abort_backup so that it can
be used for this purpose too.

However, in order for this to work, we have to repurpose the flag passed
to do_pg_abort_backup.  It used to indicate whether to throw a warning
(and the only caller always passed true).  It now indicates whether the
callback is being called at start time (in which case the session backup
state is known not to have been set to RUNNING yet, so action is always
taken) or shmem time (in which case action is only taken if the session
backup state is RUNNING).  Thus the meaning of the flag is no longer
superfluous, but it's actually quite critical to get right.  I (Álvaro)
chose to change the polarity and the code flow re. the flag from what
Bharath submitted, for coding clarity.

Co-authored-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/20221013111330.564fk5tkwe3ha77l%40alvherre.pgsql
2022-10-19 10:37:06 +02:00
Michael Paquier a19e5cee63 Rename SetSingleFuncCall() to InitMaterializedSRF()
Per discussion, the existing routine name able to initialize a SRF
function with materialize mode is unpopular, so rename it.  Equally, the
flags of this function are renamed, as of:
- SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC
- SRF_SINGLE_BLESS -> MAT_SRF_BLESS
The previous function and flags introduced in 9e98583 are kept around
for compatibility purposes, so as any extension code already compiled
with v15 continues to work as-is.  The declarations introduced here for
compatibility will be removed from HEAD in a follow-up commit.

The new names have been suggested by Andres Freund and Melanie
Plageman.

Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de
Backpatch-through: 15
2022-10-18 10:22:35 +09:00
Andres Freund 309b2cf243 Have GetCurrentTransactionStopTimestamp() set xactStopTimestamp if unset
Previously GetCurrentTransactionStopTimestamp() computed a new timestamp
whenever xactStopTimestamp was unset and xactStopTimestamp was only set when a
commit or abort record was written.

An upcoming patch will add additional calls to
GetCurrentTransactionStopTimestamp() from pgstats. To avoid computing
timestamps multiple times, set xactStopTimestamp in
GetCurrentTransactionStopTimestamp() if not already set.

Author: Dave Page <dpage@pgadmin.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://postgr.es/m/20220906155325.an3xesq5o3fq36gt%40awork3.anarazel.de
2022-10-14 11:11:33 -07:00
Michael Paquier 56b662523f Fix ordering issue with WAL operations in GIN fast insert path
Contrary to what is documented in src/backend/access/transam/README,
ginHeapTupleFastInsert() had a few ordering issues with the way it does
its WAL operations when inserting items in its fast path.

First, when using a separate list, XLogBeginInsert() was being always
called before START_CRIT_SECTION(), and in this case a second thing was
wrong when merging lists, as an exclusive lock was taken on the tail
page *before* calling XLogBeginInsert().  Finally, when inserting items
into a tail page, the order of XLogBeginInsert() and
START_CRIT_SECTION() was reversed.  This commit addresses all these
issues by moving the calls of XLogBeginInsert() after all the pages
logged are locked and pinned, within a critical section.

Author:  Matthias van de Meent, Zhang Mingli
Discussion: https://postgr.es/m/CAEze2WhL8uLMqynnnCu1LAPwxD5RKEo0nHV+eXGg_N6ELU88HQ@mail.gmail.com
2022-10-13 09:31:57 +09:00
Michael Paquier a1176c67c4 Simplify some maths in xlogreader.c
An LSN was calculated from a segment number, a segment size and a
position offset, matching exactly the LSN given by the caller of
XLogReaderValidatePageHeader().  This change removes the extra LSN
calculation, relying only on the LSN given by the function caller
instead.

Author: Bharath Rupireddy
Reviewed-by: Richard Guo, Álvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CALj2ACXuh4Ms9j9sxMYdtHEe=5sFcyrs-GAHyADu_A_G71kZTg@mail.gmail.com
2022-10-12 09:59:36 +09:00
John Naylor 6291b2546c Remove unnecessary semicolons after goto labels
According to the C standard, a label must followed by a statement.
If there was ever a time we needed an empty statement here, it was
a long time ago.

Japin Li

Reviewed by Julien Rouhaud
Discussion: https://www.postgresql.org/message-id/MEYP282MB16690F40189A4F060B41D56DB65E9%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2022-10-10 15:08:38 +07:00
Peter Eisentraut f14aad5169 Remove unnecessary uses of Abs()
Use C standard abs() or fabs() instead.

Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/4beb42b5-216b-bce8-d452-d924d5794c63%40enterprisedb.com
2022-10-07 13:29:33 +02:00
Michael Paquier 5757141cae Fix comment in xlogprefetcher.c
Author: Sho Kato
Discussion: https://postgr.es/m/TYCPR01MB684954052EC534A3261B29249F5C9@TYCPR01MB6849.jpnprd01.prod.outlook.com
2022-10-06 20:25:02 +09:00
David Rowley 2d0bbedda7 Rename shadowed local variables
In a similar effort to f01592f91, here we mostly rename shadowed local
variables to remove the warnings produced when compiling with
-Wshadow=compatible-local.

This fixes 63 warnings and leaves just 5.

Author: Justin Pryzby, David Rowley
Reviewed-by: Justin Pryzby
Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-10-05 21:01:41 +13:00
Michael Paquier c42cd05c58 Cleanup useless assignments and checks
This cleans up a couple of areas:
- Remove XLogSegNo calculation for the last WAL segment in backup in
xlog.c (7d70809 has moved this logic entirely to xlogbackup.c when
building the contents of the backup history file).
- Remove check on log_min_duration in analyze.c, as it is already true
where this code path is reached.
- Simplify call to find_option() in guc.c.

Author: Ranier Vilela
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAEudQArCDQQiPiFR16=yu9k5s2tp4tgEe1U1ZbkW4ofx81AWWQ@mail.gmail.com
2022-10-04 13:16:23 +09:00
Michael Paquier 83e42a0035 doc: Fix some grammar and typos
This fixes some areas related to logical replication and custom RMGRs.

Author: Ekaterina Kiryanova
Discussion: https://postgr.es/m/fa4773f1-1396-384a-bcd7-85b5e013f399@postgrespro.ru
Backpatch-through: 15
2022-10-01 15:28:02 +09:00
Tom Lane 2dc2e4e31a Avoid improbable PANIC during heap_update, redux.
Commit 34f581c39 intended to ensure that RelationGetBufferForTuple
would acquire a visibility-map page pin in case the otherBuffer's
all-visible bit had become set since we last had lock on that page.
But I missed a case: when we're extending the relation, VM concerns
were dealt with only in the relatively-less-likely case that we
fail to conditionally lock the otherBuffer.  I think I'd believed
that we couldn't need to worry about it if the conditional lock
succeeds, which is true for the target buffer; but the otherBuffer
was unlocked for awhile so its bit might be set anyway.  So we need
to do the GetVisibilityMapPins dance, and then also recheck the
page's free space, in both cases.

Per report from Jaime Casanova.  Back-patch to v12 as the previous
patch was (although there's still no evidence that the bug is
reachable pre-v14).

Discussion: https://postgr.es/m/E1lWLjP-00006Y-Ml@gemulon.postgresql.org
2022-09-30 19:37:00 -04:00
Michael Paquier 0823d061b0 Introduce SYSTEM_USER
SYSTEM_USER is a reserved keyword of the SQL specification that,
roughly described, is aimed at reporting some information about the
system user who has connected to the database server.  It may include
implementation-specific information about the means by the user
connected, like an authentication method.

This commit implements SYSTEM_USER as of auth_method:identity, where
"auth_method" is a keyword about the authentication method used to log
into the server (like peer, md5, scram-sha-256, gss, etc.) and
"identity" is the authentication identity as introduced by 9afffcb (peer
sets authn to the OS user name, gss to the user principal, etc.).  This
format has been suggested by Tom Lane.

Note that thanks to d951052, SYSTEM_USER is available to parallel
workers.

Bump catalog version.

Author: Bertrand Drouvot
Reviewed-by: Jacob Champion, Joe Conway, Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/7e692b8c-0b11-45db-1cad-3afc5b57409f@amazon.com
2022-09-29 15:05:40 +09:00
Thomas Munro b6d8a60aba Restore pg_pread and friends.
Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of
the pg_ prefixes where we use pread(), pwrite() and vectored variants.

We dropped support for ancient Unixes where we needed to use lseek() to
implement replacements for those, but it turns out that Windows also
changes the current position even when you pass in an offset to
ReadFile() and WriteFile() if the file handle is synchronous, despite
its documentation saying otherwise.

Switching to asynchronous file handles would fix that, but have other
complications.  For now let's just put back the pg_ prefix and add some
comments to highlight the non-standard side-effect, which we can now
describe as Windows-only.

Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
2022-09-29 13:12:11 +13:00
Tom Lane 4d2a844242 Allow callback functions to deregister themselves during a call.
Fetch the next-item pointer before the call not after, so that
we aren't dereferencing a dangling pointer if the callback
deregistered itself during the call.  The risky coding pattern
appears in CallXactCallbacks, CallSubXactCallbacks, and
ResourceOwnerReleaseInternal.  (There are some other places that
might be at hazard if they offered deregistration functionality,
but they don't.)

I (tgl) considered back-patching this, but desisted because it
wouldn't be very safe for extensions to rely on this working in
pre-v16 branches.

Hao Wu

Discussion: https://postgr.es/m/CAH+9SWXTiERkmhRke+QCcc+jRH8d5fFHTxh8ZK0-Yn4BSpyaAg@mail.gmail.com
2022-09-28 11:23:27 -04:00
Robert Haas a448e49bcb Revert 56-bit relfilenode change and follow-up commits.
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
2022-09-28 09:55:28 -04:00
Michael Paquier 2e560b974e Fix some comments of do_pg_backup_start() and do_pg_backup_stop()
Both functions referred to an incorrect variable name, so make the whole
more consistent.

Oversight in 7d70809.

Author: Kyotaro Horiguchi, Bharath Rupireddy
Discussion: https://postgr.es/m/20220927.172427.467118514018439476.horikyota.ntt@gmail.com
2022-09-28 09:58:44 +09:00
Robert Haas 4667d97ca6 Fix typos in commit 05d4cbf9b6.
Reported by Justin Pryzby.

Discussion: http://postgr.es/m/20220927185121.GE6256@telsasoft.com
2022-09-27 15:34:17 -04:00
Peter Eisentraut c8b2ef05f4 Convert *GetDatum() and DatumGet*() macros to inline functions
The previous macro implementations just cast the argument to a target
type but did not check whether the input type was appropriate.  The
function implementation can do better type checking of the input type.

For the *GetDatumFast() macros, converting to an inline function
doesn't work in the !USE_FLOAT8_BYVAL case, but we can use
AssertVariableIsOfTypeMacro() to get a similar level of type checking.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-09-27 20:50:21 +02:00
Robert Haas 05d4cbf9b6 Increase width of RelFileNumbers from 32 bits to 56 bits.
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.

If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.

This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.

Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.

Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
2022-09-27 13:25:21 -04:00
Michael Paquier 78fdb1e50f Mark ParallelMessagePending as sig_atomic_t
ParallelMessagePending was previously marked as a boolean which should
be fine on modern platforms, but the C standard recommends the use of
sig_atomic_t for variables manipulated in signal handlers.

Author: Hayato Kuroda
Discussion: https://postgr.es/m/TYAPR01MB58667C15A95A234720F4F876F5529@TYAPR01MB5866.jpnprd01.prod.outlook.com
2022-09-27 09:29:56 +09:00
Michael Paquier e1e6f8f3df Remove dependency to StringInfo in xlogbackup.{c.h}
This was used as the returned result type of the generated contents for
the backup_label and backup history files.  This is replaced by a simple
string, reducing the cleanup burden of all the callers of
build_backup_content().

Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/YzERvNPaZivHEKZJ@paquier.xyz
2022-09-27 09:15:07 +09:00
Michael Paquier 7d708093b7 Refactor creation of backup_label and backup history files
This change simplifies some of the logic related to the generation and
creation of the backup_label and backup history files, which has become
unnecessarily complicated since the removal of the exclusive backup mode
in commit 39969e2.  The code was previously generating the contents of
these files as a string (start phase for the backup_label and stop phase
for the backup history file), one problem being that the contents of the
backup_label string were scanned to grab some of its internal contents
at the stop phase.

This commit changes the logic so as we store the data required to build
these files in an intermediate structure named BackupState.  The
backup_label file and backup history file strings are generated when
they are ready to be sent back to the client.  Both files are now
generated with the same code path.  While on it, this commit renames
some variables for clarity.

Two new files named xlogbackup.{c,h} are introduced in this commit, to
remove from xlog.c some of the logic around base backups.  Note that
more could be moved to this new set of files.

Author: Bharath Rupireddy, Michael Paquier
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CALj2ACXWwTDgJqCjdaPyfR7djwm6SrybGcrZyrvojzcsmt4FFw@mail.gmail.com
2022-09-26 11:15:47 +09:00
Peter Eisentraut 26f7802beb Message style improvements 2022-09-24 18:41:25 -04:00
Jeff Davis 163b0993a1 Fix race condition where heap_delete() fails to pin VM page.
Similar to 5f12bc94dc, the code must re-check PageIsAllVisible() after
buffer lock is re-acquired. Backpatching to the same version, 12.

Discussion: https://postgr.es/m/CAEP4nAw9jYQDKd_5Y+-s2E4YiUJq1vqiikFjYGpLShtp-K3gag@mail.gmail.com
Reported-by: Robins Tharakan
Reviewed-by: Robins Tharakan
Backpatch-through: 12
2022-09-22 11:04:00 -07:00
Andres Freund e6927270cd meson: Add initial version of meson based build system
Autoconf is showing its age, fewer and fewer contributors know how to wrangle
it. Recursive make has a lot of hard to resolve dependency issues and slow
incremental rebuilds. Our home-grown MSVC build system is hard to maintain for
developers not using Windows and runs tests serially. While these and other
issues could individually be addressed with incremental improvements, together
they seem best addressed by moving to a more modern build system.

After evaluating different build system choices, we chose to use meson, to a
good degree based on the adoption by other open source projects.

We decided that it's more realistic to commit a relatively early version of
the new build system and mature it in tree.

This commit adds an initial version of a meson based build system. It supports
building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD,
Solaris and Windows (however only gcc is supported on aix, solaris). For
Windows/MSVC postgres can now be built with ninja (faster, particularly for
incremental builds) and msbuild (supporting the visual studio GUI, but
building slower).

Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM
bitcode generation, documentation adjustments) are done in subsequent commits
requiring further review. Other aspects (e.g. not installing test-only
extensions) are not yet addressed.

When building on Windows with msbuild, builds are slower when using a visual
studio version older than 2019, because those versions do not support
MultiToolTask, required by meson for intra-target parallelism.

The plan is to remove the MSVC specific build system in src/tools/msvc soon
after reaching feature parity. However, we're not planning to remove the
autoconf/make build system in the near future. Likely we're going to keep at
least the parts required for PGXS to keep working around until all supported
versions build with meson.

Some initial help for postgres developers is at
https://wiki.postgresql.org/wiki/Meson

With contributions from Thomas Munro, John Naylor, Stone Tickle and others.

Author: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
2022-09-21 22:37:17 -07:00
Michael Paquier fbb5f54b67 Clear ps display of startup process at the end of recovery
If the ps display is not cleared at this point, the process could
continue displaying "recovering NNN" even if handling end-of-recovery
steps.  df9274a has tackled that by providing some information with the
end-of-recovery checkpoint but 7ff23c6 has nullified the effect of the
first commit.

Per a suggestion from Justin, just clear the ps display when we are done
with recovery, so as no incorrect information is displayed.  This may
get extended in the future, but for now restore the pre-7ff23c6
behavior.

Author: Justin Prysby
Discussion: https://postgr.es/m/20220913223954.GU31833@telsasoft.com
Backpatch-through: 15
2022-09-22 14:25:09 +09:00
Tom Lane 152c9f7b8f Suppress variable-set-but-not-used warnings from clang 15.
clang 15+ will issue a set-but-not-used warning when the only
use of a variable is in autoincrements (e.g., "foo++;").
That's perfectly sensible, but it detects a few more cases that
we'd not noticed before.  Silence the warnings with our usual
methods, such as PG_USED_FOR_ASSERTS_ONLY, or in one case by
actually removing a useless variable.

One thing that we can't nicely get rid of is that with %pure-parser,
Bison emits "yynerrs" as a local variable that falls foul of this
warning.  To silence those, I inserted "(void) yynerrs;" in the
top-level productions of affected grammars.

Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
annoying compiler warnings but changes no behavior.  Hence,
back-patch to 9.5, which is as far as these patches go without
issues.  (A preliminary check shows that the prior branches
need some other set-but-not-used cleanups too, so I'll leave
them for another day.)

Discussion: https://postgr.es/m/514615.1663615243@sss.pgh.pa.us
2022-09-20 12:04:37 -04:00
Peter Geoghegan bfcf1b3480 Harmonize parameter names in storage and AM code.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in storage, catalog,
access method, executor, and logical replication code, as well as in
miscellaneous utility/library code.

Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.  Later commits will do the
same for other parts of the codebase.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
2022-09-19 19:18:36 -07:00
Peter Geoghegan 4bac9600f0 Harmonize heapam and tableam parameter names.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions.  Having parameter names
that are reliably consistent in this way will make it easier to reason
about groups of related C functions from the same translation unit as a
module.  It will also make certain refactoring tasks easier.

Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.  Later commits will do the
same for other parts of the codebase.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
2022-09-19 16:46:23 -07:00
John Naylor 08f8af983a Fix typos referring to PGPROC
Japin Li

Reviewed by Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/MEYP282MB1669459813B36FB5EAA38434B6499@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2022-09-19 11:36:51 +07:00
Noah Misch b4f584f9d2 Reset InstallXLogFileSegmentActive after walreceiver self-initiated exit.
After commit cc2c7d65fc added this flag,
failure to reset it caused assertion failures.  In non-assert builds, it
made the system fail to achieve the objectives listed in that commit;
chiefly, we might emit a spurious log message.  Back-patch to v15, where
that commit first appeared.

Bharath Rupireddy and Kyotaro Horiguchi.  Reviewed by Dilip Kumar,
Nathan Bossart and Michael Paquier.  Reported by Dilip Kumar.

Discussion: https://postgr.es/m/CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com
2022-09-15 06:45:23 -07:00
Tom Lane 31dcfae83c Use the terminology "WAL file" not "log file" more consistently.
Referring to the WAL as just "log" invites confusion with the
postmaster log, so avoid doing that in docs and error messages.
Also shorten "WAL segment file" to just "WAL file" in various
places.

Bharath Rupireddy, reviewed by Nathan Bossart and Kyotaro Horiguchi

Discussion: https://postgr.es/m/CALj2ACUeXa8tDPaiTLexBDMZ7hgvaN+RTb957-cn5qwv9zf-MQ@mail.gmail.com
2022-09-14 18:40:58 -04:00
Tom Lane 0a20ff54f5 Split up guc.c for better build speed and ease of maintenance.
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation.  It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:

* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
  SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
  built-in GUC variables, along with some already-exported constant
  tables.
* GUC check/assign/show hook functions are moved to the variable's
  home module, whenever that's clearly identifiable.  A few hard-
  to-classify hooks ended up in commands/variable.c, which was
  already a home for miscellaneous GUC hook functions.

To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module.  That allowed removal of #include "guc.h"
from some existing headers.  The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.

There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments.  But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.

Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.

Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
2022-09-13 11:11:45 -04:00
Alvaro Herrera 6710e83a67
Remove useless pstrdups in untransformRelOptions
The two strings are already a single palloc'd chunk, not freed; there's
no reason to allocate separate copies that have the same lifetime.

This code is only called in short-lived memory contexts (except in some
cases in TopTransactionContext, which is still short-lived enough not to
really matter), and typically only for short arrays, so the memory or
computation saved is likely negligible.  However, let's fix it to avoid
leaving a bad example of code to copy.  This is the only place I could
find where we're doing this with makeDefElem().

Reported-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20220909142050.3vv2hjekppk265dd@alvherre.pgsql
2022-09-13 11:59:31 +02:00
Michael Paquier bb629c294b Rename macro related to pg_backup_stop()
This should have been part of 39969e2 that has renamed pg_stop_backup()
to pg_backup_stop(), and this one is the last reference to
pg_stop/start_backup() I could find in the tree.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACXjvC28ppeDTCrfaSyHga0ggP5nRLJbsjx=7N-74UT4QA@mail.gmail.com
2022-09-13 10:45:43 +09:00
David Rowley 6b89ce1a21 Don't reference out-of-bounds array elements in brin_minmax_multi.c
The primary fix here is to fix has_matching_range() so it does not
reference ranges->values[-1] when nranges == 0.  Similar problems existed
in AssertCheckRanges() too.  It does not look like any of these problems
could lead to a crash as the array in question is at the end of the Ranges
struct, and values[-1] is memory that belongs to other fields in the
struct.  However, let's get rid of these rather unsafe coding practices.

In passing, I (David) adjusted some comments to try to make it more clear
what some of the fields are for in the Ranges struct.  I had to study the
code to find out what nsorted was for as I couldn't tell from the
comments.

Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqJQzPitufX-jR=YUbJafpCDAKUnwgdbX_MzSc93wuvdw@mail.gmail.com
Backpatch-through: 14, where multi-range brin was added.
2022-09-13 11:02:56 +12:00
Peter Eisentraut e8d78581bb Revert "Convert *GetDatum() and DatumGet*() macros to inline functions"
This reverts commit 595836e99b.

It has problems when USE_FLOAT8_BYVAL is off.
2022-09-12 19:57:07 +02:00
Peter Eisentraut 595836e99b Convert *GetDatum() and DatumGet*() macros to inline functions
The previous macro implementations just cast the argument to a target
type but did not check whether the input type was appropriate.  The
function implementation can do better type checking of the input type.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-09-12 17:36:26 +02:00
Peter Eisentraut 5015e1e1b5 Assorted examples of expanded type-safer palloc/pg_malloc API
This adds some uses of the new palloc/pg_malloc variants here and
there as a demonstration and test.  This is kept separate from the
actual API patch, since the latter might be backpatched at some point.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
2022-09-12 08:45:03 +02:00
Michael Paquier df4a056619 Add more error context to RestoreBlockImage() and consume it
On failure in restoring a block image, no details were provided, while
it is possible to see failure with an inconsistent record state, a
failure in processing decompression or a failure in decompression
because a build does not support this option.

RestoreBlockImage() is used in two code paths in the backend code,
during recovery and when checking a page consistency after applying
masking, and both places are changed to consume the error message
produced by the internal routine when it returns a false status.  All
the error messages are reported under ERRCODE_INTERNAL_ERROR, that gets
used also when attempting to access a page compressed by a method
not supported by the build attempting the decompression.  This is
something that can happen in core when doing physical replication with
primary and standby using inconsistent build options, for example.

This routine is available since 2c03216d and it has never provided any
context about the error happening when it failed.  This change is
justified even more after 57aa5b2, that introduced compression of FPWs
in WAL.

Reported-by: Justin Prysby
Author: Michael Paquier
Discussion: https://postgr.es/m/20220905002320.GD31833@telsasoft.com
Backpatch-through: 15
2022-09-09 10:00:40 +09:00
Peter Geoghegan d977ffd923 Instrument freezing in autovacuum log reports.
Add a new line to log reports from autovacuum (as well as VACUUM VERBOSE
output) that shows information about freezing.  Emphasis is placed on
the total number of heap pages that had one or more tuples frozen by
VACUUM.  The total number of tuples frozen is also shown.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Janes <jeff.janes@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznTY6D0zyE8VLrC6Gd4kh_HGAXxnTPtcOQOOsxzLx9zog@mail.gmail.com
2022-09-08 10:29:39 -07:00
Thomas Munro adb466150b Fix recovery_prefetch with low maintenance_io_concurrency.
We should process completed IOs *before* trying to start more, so that
it is always possible to decode one more record when the decoded record
queue is empty, even if maintenance_io_concurrency is set so low that a
single earlier WAL record might have saturated the IO queue.

That bug was hidden because the effect of maintenance_io_concurrency was
arbitrarily clamped to be at least 2.  Fix the ordering, and also remove
that clamp.  We need a special case for 0, which is now treated the same
as recovery_prefetch=off, but otherwise the number is used directly.
This allows for testing with 1, which would have made the problem
obvious in simple test scenarios.

Also add an explicit error message for missing contrecords.  It was a
bit strange that we didn't report an error already, and became a latent
bug with prefetching, since the internal state that tracks aborted
contrecords would not survive retrying, as revealed by
026_overwrite_contrecord.pl with this adjustment.  Reporting an error
prevents that.

Back-patch to 15.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
2022-09-08 21:44:55 +12:00
David Rowley 8b26769bc4 Fix an assortment of improper usages of string functions
In a similar effort to f736e188c and 110d81728, fixup various usages of
string functions where a more appropriate function is available and more
fit for purpose.

These changes include:

1. Use cstring_to_text_with_len() instead of cstring_to_text() when
   working with a StringInfoData and the length can easily be obtained.
2. Use appendStringInfoString() instead of appendStringInfo() when no
   formatting is required.
3. Use pstrdup(...) instead of psprintf("%s", ...)
4. Use pstrdup(...) instead of psprintf(...) (with no formatting)
5. Use appendPQExpBufferChar() instead of appendPQExpBufferStr() when the
   length of the string being appended is 1.
6. appendStringInfoChar() instead of appendStringInfo() when no formatting
   is required and string is 1 char long.
7. Use appendPQExpBufferStr(b, .) instead of appendPQExpBuffer(b, "%s", .)
8. Don't use pstrdup when it's fine to just point to the string constant.

I (David) did find other cases of #8 but opted to use #4 instead as I
wasn't certain enough that applying #8 was ok (e.g in hba.c)

Author: Ranier Vilela, David Rowley
Discussion: https://postgr.es/m/CAApHDvo2j2+RJBGhNtUz6BxabWWh2Jx16wMUMWKUjv70Ver1vg@mail.gmail.com
2022-09-06 13:19:44 +12:00
Peter Eisentraut 6bcda4a721 Fix incorrect uses of Datum conversion macros
Since these macros just cast whatever you give them to the designated
output type, and many normal uses also cast the output type further, a
number of incorrect uses go undiscovered.  The fixes in this patch
have been discovered by changing these macros to inline functions,
which is the subject of a future patch.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-09-05 13:30:44 +02:00
Thomas Munro 932b016300 Fix cache invalidation bug in recovery_prefetch.
XLogPageRead() can retry internally after a pread() system call has
succeeded, in the case of short reads, and page validation failures
while in standby mode (see commit 0668719801).  Due to an oversight in
commit 3f1ce973, these cases could leave stale data in the internal
cache of xlogreader.c without marking it invalid.  The main defense
against stale cached data on failure to read a page was in the error
handling path of the calling function ReadPageInternal(), but that
wasn't quite enough for errors handled internally by XLogPageRead()'s
retry loop if we then exited with XLREAD_WOULDBLOCK.

1.  ReadPageInternal() now marks the cache invalid before calling the
    page_read callback, by setting state->readLen to 0.  It'll be set to
    a non-zero value only after a successful read.  It'll stay valid as
    long as the caller requests data in the cached range.

2.  XLogPageRead() no long performs internal retries while reading
    ahead.  While such retries should work, the general philosophy is
    that we should give up prefetching if anything unusual happens so we
    can handle it when recovery catches up, to reduce the complexity of
    the system.  Let's do that here too.

3.  While here, a new function XLogReaderResetError() improves the
    separation between xlogrecovery.c and xlogreader.c, where the former
    previously clobbered the latter's internal error buffer directly.
    The new function makes this more explicit, and also clears a related
    flag, without which a standby would needlessly retry in the outer
    function.

Thanks to Noah Misch for tracking down the conditions required for a
rare build farm failure in src/bin/pg_ctl/t/003_promote.pl, and
providing a reproducer.

Back-patch to 15.

Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20220807003627.GA4168930%40rfd.leadboat.com
2022-09-03 13:28:43 +12:00
Michael Paquier bfb9dfd937 Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat()
Several backend-side loops scanning one or more directories with
ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory
copy, temporary file removal, configuration file parsing, some logical
decoding logic and some pgtz stuff) already know the type of the entry
being scanned thanks to the dirent structure associated to the entry, on
platforms where we know about DT_REG, DT_DIR and DT_LNK to make the
difference between a regular file, a directory and a symbolic link.

Relying on the direct structure of an entry saves a few system calls to
stat() and lstat() in the loops updated here, shaving some code while on
it.  The logic of the code remains the same, calling stat() or lstat()
depending on if it is necessary to look through symlinks.

Authors: Nathan Bossart, Bharath Rupireddy
Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com
2022-09-02 16:58:06 +09:00
Peter Geoghegan c3ffa731a5 Derive freeze cutoff from nextXID, not OldestXmin.
Before now, the cutoffs that VACUUM used to determine which XIDs/MXIDs
to freeze were determined at the start of each VACUUM by taking related
cutoffs that represent which XIDs/MXIDs VACUUM should treat as still
running, and subtracting an XID/MXID age based value controlled by GUCs
like vacuum_freeze_min_age.  The FreezeLimit cutoff (XID freeze cutoff)
was derived by subtracting an XID age value from OldestXmin, while the
MultiXactCutoff cutoff (MXID freeze cutoff) was derived by subtracting
an MXID age value from OldestMxact.  This approach didn't match the
approach used nearby to determine whether this VACUUM operation should
be an aggressive VACUUM or not.

VACUUM now uses the standard approach instead: it subtracts the same
age-based values from next XID/next MXID (rather than subtracting from
OldestXmin/OldestMxact).  This approach is simpler and more uniform.
Most of the time it will have only a negligible impact on how and when
VACUUM freezes.  It will occasionally make VACUUM more robust in the
event of problems caused by long running transaction.  These are cases
where OldestXmin and OldestMxact are held back by so much that they
attain an age that is a significant fraction of the value of age-based
settings like vacuum_freeze_min_age.

There is no principled reason why freezing should be affected in any way
by the presence of a long-running transaction -- at least not before the
point that the OldestXmin and OldestMxact limits used by each VACUUM
operation attain an age that makes it unsafe to freeze some of the
XIDs/MXIDs whose age exceeds the value of the relevant age-based
settings.  The new approach should at least make freezing degrade more
gracefully than before, even in the most extreme cases.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com
2022-08-31 11:37:35 -07:00
Peter Geoghegan 9887dd38f9 Adjust comments that called MultiXactIds "XMIDs".
Oversights in commits 0b018fab and f3c15cbe.
2022-08-29 19:42:30 -07:00
Tom Lane 7fed801135 Clean up inconsistent use of fflush().
More than twenty years ago (79fcde48b), we hacked the postmaster
to avoid a core-dump on systems that didn't support fflush(NULL).
We've mostly, though not completely, hewed to that rule ever since.
But such systems are surely gone in the wild, so in the spirit of
cleaning out no-longer-needed portability hacks let's get rid of
multiple per-file fflush() calls in favor of using fflush(NULL).

Also, we were fairly inconsistent about whether to fflush() before
popen() and system() calls.  While we've received no bug reports
about that, it seems likely that at least some of these call sites
are at risk of odd behavior, such as error messages appearing in
an unexpected order.  Rather than expend a lot of brain cells
figuring out which places are at hazard, let's just establish a
uniform coding rule that we should fflush(NULL) before these calls.
A no-op fflush() is surely of trivial cost compared to launching
a sub-process via a shell; while if it's not a no-op then we likely
need it.

Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
2022-08-29 13:55:41 -04:00
Robert Haas 6672d79139 Prevent WAL corruption after a standby promotion.
When a PostgreSQL instance performing archive recovery but not using
standby mode is promoted, and the last WAL segment that it attempted
to read ended in a partial record, the previous code would create
invalid WAL on the new timeline. The WAL from the previously timeline
would be copied to the new timeline up until the end of the last valid
record, but instead of beginning to write WAL at immediately
afterwards, the promoted server would write an overwrite contrecord at
the beginning of the next segment. The end of the previous segment
would be left as all-zeroes, resulting in failures if anything tried
to read WAL from that file.

The root of the issue is that ReadRecord() decides whether to set
abortedRecPtr and missingContrecPtr based on the value of StandbyMode,
but ReadRecord() switches to a new timeline based on the value of
ArchiveRecoveryRequested. We shouldn't try to write an overwrite
contrecord if we're switching to a new timeline, so change the test in
ReadRecod() to check ArchiveRecoveryRequested instead.

Code fix by Dilip Kumar. Comments by me incorporating suggested
language from Álvaro Herrera. Further review from Kyotaro Horiguchi
and Sami Imseih.

Discussion: http://postgr.es/m/CAFiTN-t7umki=PK8dT1tcPV=mOUe2vNhHML6b3T7W7qqvvajjg@mail.gmail.com
Discussion: http://postgr.es/m/FB0DEA0B-E14E-43A0-811F-C1AE93D00FF3%40amazon.com
2022-08-29 11:07:37 -04:00
David Rowley d389487525 Small refactor to get rid of -Wshadow=compatible-local warning
Further reduce -Wshadow=compatible-local warnings by 1 by refactoring the
code in gistRelocateBuildBuffersOnSplit() to make use of
foreach_current_index() instead of manually incrementing a variable on
each loop.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGZX-X=Bn4moyXgfFa0CdSUwoa04d3isit3=1qo8F8Bw@mail.gmail.com
2022-08-26 02:46:56 +12:00
David Rowley 3e0fff2e68 More -Wshadow=compatible-local warning fixes
In a similar effort to f01592f91, here we're targetting fixing the
warnings where we've deemed the shadowing variable to serve a close enough
purpose to the shadowed variable just to reuse the shadowed version and
not declare the shadowing variable at all.

By my count, this takes the warning count from 106 down to 71.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com
2022-08-26 02:35:40 +12:00
David Rowley f959bf9a5b Further -Wshadow=compatible-local warning fixes
These should have been included in 421892a19 as these shadowed variable
warnings can also be fixed by adjusting the scope of the shadowed variable
to put the declaration for it in an inner scope.

This is part of the same effort as f01592f91.

By my count, this takes the warning count from 114 down to 106.

Author: David Rowley and Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-08-24 22:04:28 +12:00
Michael Paquier d951052a9e Allow parallel workers to retrieve some data from Port
This commit moves authn_id into a new global structure called
ClientConnectionInfo (mapping to a MyClientConnectionInfo for each
backend) which is intended to hold all the client information that
should be shared between the backend and any of its parallel workers,
access for extensions and triggers being the primary use case.  There is
no need to push all the data of Port to the workers, and authn_id is
quite a generic concept so using a separate structure provides the best
balance (the name of the structure has been suggested by Robert Haas).

While on it, and per discussion as this would be useful for a potential
SYSTEM_USER that can be accessed through parallel workers, a second
field is added for the authentication method, copied directly from
Port.

ClientConnectionInfo is serialized and restored using a new parallel
key and a structure tracks the length of the authn_id, making the
addition of more fields straight-forward.

Author: Jacob Champion
Reviewed-by: Bertrand Drouvot, Stephen Frost, Robert Haas, Tom Lane,
Michael Paquier, Julien Rouhaud
Discussion: https://postgr.es/m/793d990837ae5c06a558d58d62de9378ab525d83.camel@vmware.com
2022-08-24 12:57:13 +09:00
David Rowley 421892a192 Further reduce warnings with -Wshadow=compatible-local
In a similar effort to f01592f91, here we're targetting fixing the
warnings that -Wshadow=compatible-local produces that we can fix by moving
a variable to an inner scope to stop that variable from being shadowed by
another variable declared somewhere later in the function.

All of the warnings being fixed here are changing the scope of variables
which are being used as an iterator for a "for" loop.  In each instance,
the fix happens to be changing the for loop to use the C99 type
initialization.  Much of this code likely pre-dates our use of C99.

Reducing the scope of the outer scoped variable seems like the safest way
to fix these.  Renaming seems more likely to risk patches using the wrong
variable.  Reducing the scope is more likely to result in a compilation
failure after applying some future patch rather than introducing bugs with
it.

By my count, this takes the warning count from 129 down to 114.

Author: Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-08-24 12:27:12 +12:00
John Naylor 1b9050da66 Remove empty statement
Peter Smith

Discussion: https://www.postgresql.org/message-id/CAHut%2BPtRGVuj8Q_GpHHxZyk7fGwdYDG8_s4GSfKoc_4Yd9vR-w%40mail.gmail.com
2022-08-23 09:24:32 +07:00
Robert Haas ec97db399f Adjust assertion in XLogDecodeNextRecord.
As written, if you use XLogBeginRead() to position an xlogreader at
the beginning of a WAL page and then try to read WAL, this assertion
will fail. However, the header comment for XLogBeginRead() claims
that positioning an xlogreader at the beginning of a page is valid,
and the code here is perfectly able to cope with it. It's only the
assertion that causes trouble. So relax it.

This is formally a bug in all supported branches, but as it doesn't
seem to have any consequences for current uses of the xlogreader
facility, no back-patch, at least for now.

Dilip Kumar and Robert Haas

Discussion: http://postgr.es/m/CA+TgmoaJSs2_7WHW2GzFYe9+zfPtxBKvT3GW47+x=ptUE=cULw@mail.gmail.com
2022-08-18 12:22:20 -04:00
Michael Paquier d265cd2029 Use SetInstallXLogFileSegmentActive() in more places in xlog.c
This reduces the code paths where XLogCtl->InstallXLogFileSegmentActive
is directly touched, and this wrapper function does the same thing as
the original code replaced by the function call.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/CALj2ACVhkf-bC5CX-=6iBUfkO5GqmBntQH+m=HpY0iQ=-g1pRg@mail.gmail.com
2022-08-17 15:28:45 +09:00
Tomas Vondra c52ad9c4ef Fix assert in logicalmsg_desc
The assert, introduced by 9f1cf97bb5, is intended to check if the prefix
is terminated by a \0 byte, but it has two flaws. Firstly, prefix_size
includes the \0 byte, so prefix[prefix_size] points to the byte after
the null byte. Secondly, the check ensures the byte is not equal \0,
while it should be checking the opposite.

Backpatch-through: 14
Discussion: https://postgr.es/m/b99b6101-2f14-3796-3dfa-4a6cd7d4326d@enterprisedb.com
2022-08-16 23:52:10 +02:00
Robert Haas a8c0128697 Move basebackup code to new directory src/backend/backup
Reviewed by David Steele and Justin Pryzby

Discussion: http://postgr.es/m/CA+TgmoafqboATDSoXHz8VLrSwK_MDhjthK4hEpYjqf9_1Fmczw%40mail.gmail.com
2022-08-10 14:03:23 -04:00
Thomas Munro 670475b2fa Fix obsolete comment in commit_ts.c.
Commit 08aa89b removed COMMIT_TS_SETTS, but left a reference in a
comment.

Author: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/20220726173343.GA154110%40nathanxps13
2022-08-09 12:58:04 +12:00
Peter Geoghegan f68faf4c75 Fix comments about deduplication updating page.
nbtree deduplication passes add tuples from the original/target page to
a temp page, merging as necessary.  The temp page is copied back to the
target permanent page in the critical section.  This is similar to the
approach taken by nbtree page splits.

Adjust comments that referred to updating the original page in-place as
tuples were merged.  These were left over from earlier versions of the
deduplication patch that didn't yet use a temp page.
2022-08-05 14:25:49 -07:00
Peter Geoghegan b2fe783aec Add missing parenthesis to max item size macro.
Oversight in commit 92f37505, per buildfarm.
2022-08-05 13:06:19 -07:00
Alvaro Herrera e44dae07f9
BRIN: mask BRIN_EVACUATE_PAGE for WAL consistency checking
That bit is unlogged and therefore it's wrong to consider it in WAL page
comparison.

Add a test that tickles the case, as branch testing technology allows.

This has been a problem ever since wal consistency checking was
introduced (commit a507b86900 for pg10), so backpatch to all supported
branches.

Author: 王海洋 (Haiyang Wang) <wanghaiyang.001@bytedance.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CACciXAD2UvLMOhc4jX9VvOKt7DtYLr3OYRBhvOZ-jRxtzc_7Jg@mail.gmail.com
Discussion: https://postgr.es/m/CACciXADOfErX9Bx0nzE_SkdfXr6Bbpo5R=v_B6MUTEYW4ya+cg@mail.gmail.com
2022-08-05 18:00:17 +02:00
Thomas Munro d2e150831a Remove configure probe for fdatasync.
fdatasync() is in SUSv2, and all targeted Unix systems have it.  We have
a replacement function for Windows.

We retain the probe for the function declaration, which allows us to
supply the mysteriously missing declaration for macOS, and also for
Windows.  No need to keep a HAVE_FDATASYNC macro around.

Also rename src/port/fdatasync.c to win32fdatasync.c since it's only for
Windows.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
2022-08-05 16:37:38 +12:00
Thomas Munro cf112c1220 Remove dead pread and pwrite replacement code.
pread() and pwrite() are in SUSv2, and all targeted Unix systems have
them.

Previously, we defined pg_pread and pg_pwrite to emulate these function
with lseek() on old Unixen.  The names with a pg_ prefix were a reminder
of a portability hazard: they might change the current file position.
That hazard is gone, so we can drop the prefixes.

Since the remaining replacement code is Windows-only, move it into
src/port/win32p{read,write}.c, and move the declarations into
src/include/port/win32_port.h.

No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only
used for declarations in port.h which have now moved into win32_port.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 09:49:21 +12:00
Thomas Munro 2b1f580ee2 Remove configure probes for symlink/readlink, and dead code.
symlink() and readlink() are in SUSv2 and all targeted Unix systems have
them.  We have partial emulation on Windows.  Code that raised runtime
errors on systems without it has been dead for years, so we can remove
that and also references to such systems in the documentation.

Define HAVE_READLINK and HAVE_SYMLINK macros on Unix.  Our Windows
replacement functions based on junction points can't be used for
relative paths or for non-directories, so the macros can be used to
check for full symlink support.  The places that deal with tablespaces
can just use symlink functions without checking the macros.  (If they
did check the macros, they'd need to provide an #else branch with a
runtime or compile time error, and it'd be dead code.)

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 09:22:56 +12:00
Daniel Gustafsson f8f20203c2 Rephrase comments to make them clearer
The use of "we" when referring to the active backend might be
misunderstood, so rephrase to make it clearer who is performing
the actions discussed in the comment.

Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Erikjan Rijkers <er@xs4all.nl>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEG8a3LRSMqkvjiURiJoSi4aGWORpiXUmUfQQK5PaD6WfPzu3w@mail.gmail.com
2022-08-04 16:30:06 +02:00
Tom Lane e09d7a1262 Improve speed of hash index build.
In the initial data sort, if the bucket numbers are the same then
next sort on the hash value.  Because index pages are kept in
hash value order, this gains a little speed by allowing the
eventual tuple insertions to be done sequentially, avoiding repeated
data movement within PageAddItem.  This seems to be good for overall
speedup of 5%-9%, depending on the incoming data.

Simon Riggs, reviewed by Amit Kapila

Discussion: https://postgr.es/m/CANbhV-FG-1ZNMBuwhUF7AxxJz3u5137dYL-o6hchK1V_dMw86g@mail.gmail.com
2022-07-28 14:34:32 -04:00
Alvaro Herrera 9e4f914b5e
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records.  Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:

    CREATE DATABASE
    DROP DATABASE
    DROP TABLESPACE

If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.

A fix for this problem was already attempted in 49d9cfc68b, but it
was reverted because of design issues.  This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency.  Tablespaces
are created as real directories, and should be deleted
by later replay.  CheckRecoveryConsistency ensures
they have disappeared.

The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING.  Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
2022-07-28 08:40:06 +02:00
Michael Paquier ffd1b6bb6f Add overflow protection for block-related data in WAL records
XLogRecordBlockHeader, the header holding the information for the data
related to a block, tracks the length of the data appended to the WAL
record with data_length (uint16).  This limitation in size was not
enforced by the public routine in charge of registering the data
assembled later to form the WAL record inserted, XLogRegisterBufData().
Incorrectly used, it could lead to the generation of records with some
of its data overflowed.  This commit adds some safeguards to prevent
that for the block data, complaining immediately if attempting to add to
a record block information with a size larger than UINT16_MAX, which is
the limit implied by the internal logic.

Note that this also adjusts XLogRegisterData() and XLogRegisterBufData()
so as the length of the WAL record data given by the caller is unsigned,
matching with what gets stored in XLogRecData->len.

Extracted from a larger patch by the same author.  The original patch
includes more protections when assembling a record in full that will be
looked at separately later.

Author: Matthias van de Meent
Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier, David
Zhang
Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com
2022-07-27 13:35:40 +09:00
Tom Lane f92944137c Force immediate commit after CREATE DATABASE etc in extended protocol.
We have a few commands that "can't run in a transaction block",
meaning that if they complete their processing but then we fail
to COMMIT, we'll be left with inconsistent on-disk state.
However, the existing defenses for this are only watertight for
simple query protocol.  In extended protocol, we didn't commit
until receiving a Sync message.  Since the client is allowed to
issue another command instead of Sync, we're in trouble if that
command fails or is an explicit ROLLBACK.  In any case, sitting
in an inconsistent state while waiting for a client message
that might not come seems pretty risky.

This case wasn't reachable via libpq before we introduced pipeline
mode, but it's always been an intended aspect of extended query
protocol, and likely there are other clients that could reach it
before.

To fix, set a flag in PreventInTransactionBlock that tells
exec_execute_message to force an immediate commit.  This seems
to be the approach that does least damage to existing working
cases while still preventing the undesirable outcomes.

While here, add some documentation to protocol.sgml that explicitly
says how to use pipelining.  That's latent in the existing docs if
you know what to look for, but it's better to spell it out; and it
provides a place to document this new behavior.

Per bug #17434 from Yugo Nagata.  It's been wrong for ages,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
2022-07-26 13:07:03 -04:00
Amit Kapila 857dd35348 Eliminate duplicate code in table.c.
Additionally improve the error message similar to how it was done in
2ed532ee8c.

Author: Junwang Zhao, Aleksander Alekseev
Reviewed-by: Amit Kapila, Alvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEG8a3KbVtBm_BYf5tGsKHvmMieQVsq_jBPOg75VViQB7ACL8Q%40mail.gmail.com
2022-07-26 08:22:53 +05:30
Fujii Masao 2387f52962 Remove useless arguments in ReadCheckpointRecord().
This commit removes two arguments "report" and "whichChkpt"
in ReadCheckpointRecord().

"report" is obviously useless because it's always true, i.e., there are
two callers of the function and they always specify true as "report".
Commit 1d919de5eb removed the only call with "report" = false.

"whichChkpt" indicated where the specified checkpoint location
came from, pg_control or backup_label. This information was used
to report different error messages depending on where the invalid
checkpoint record came from, when it was found.
But ReadCheckpointRecord() doesn't need to do that because
its callers already do that and users can still identify where
the invalid checkpoint record came from, by reading such log messages.
Also when "whichChkpt" was 0, the word "primary checkpoint" was used
in the log message and could confuse users because the concept of
primary and secondary checkpoints was already removed before.
These are why this commit removes "whichChkpt" argument.

Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi
Discussion: https://postgr.es/m/fa2e12eb-81c3-0717-0272-755f8a81c8f2@oss.nttdata.com
2022-07-25 10:59:38 +09:00
Thomas Munro 5344723755 Remove unnecessary Windows-specific basebackup code.
Commit c6f2f016 added an explicit check for a Windows "junction point".
That turned out to be needed only because get_dirent_type() was busted
on Windows.  It's been fixed by commit 9d3444dc, so remove it.

Add a TAP-test to demonstrate that in-place tablespaces are copied by
pg_basebackup.  This exercises the codepath that would fail before
c6f2f016 on Windows, and shows that it still doesn't fail now that we're
using get_dirent_type() on both Windows and Unix.

Back-patch to 15, where in-place tablespaces arrived and caused this
problem (ie directories where previously only symlinks were expected).

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGLzLK4PUPx0_AwXEWXOYAejU%3D7XpxnYE55Y%2Be7hB2N3FA%40mail.gmail.com
2022-07-22 17:41:47 +12:00
Thomas Munro a1b56090eb Remove O_FSYNC and associated macros.
O_FSYNC was a pre-POSIX way of spelling O_SYNC, supported since commit
9d645fd84c for non-conforming operating systems of the time.  It's not
needed on any modern system.  We can just use standard O_SYNC directly
if it exists (= all targeted systems except Windows), and get rid of our
OPEN_SYNC_FLAG macro.

Similarly for standard O_DSYNC, we can just use that directly if it
exists (= all targeted systems except DragonFlyBSD), and get rid of our
OPEN_DATASYNC_FLAG macro.

We still avoid choosing open_datasync as a default value for
wal_sync_method if O_DSYNC has the same value as O_SYNC (= only
OpenBSD), so there is no change in default behavior.

Discussion: https://postgr.es/m/CA%2BhUKGJE7y92NY7FG2ftUbZUaqohBU65_Ys_7xF5mUHo4wirTQ%40mail.gmail.com
2022-07-22 12:41:17 +12:00
Thomas Munro 4f1f5a7f85 Remove fls(), use pg_leftmost_one_pos32() instead.
Commit 4f658dc8 provided the traditional BSD fls() function in
src/port/fls.c so it could be used in several places.  Later we added a
bunch of similar facilities in pg_bitutils.h, based on compiler
builtins that map to hardware instructions.  It's a bit confusing to
have both 1-based and 0-based variants of this operation in use in
different parts of the tree, and neither is blessed by a standard.
Let's drop fls.c and the configure probe, and reuse the newer code.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2B7dSX1XF8yFGmYk-%3D48dbjH2kmzZj16XvhbrWP-9BzRg%40mail.gmail.com
2022-07-22 10:41:50 +12:00
Fujii Masao b24b2be119 Fix assertion failure and segmentation fault in backup code.
When a non-exclusive backup is canceled, do_pg_abort_backup() is called
and resets some variables set by pg_backup_start (pg_start_backup in v14
or before). But previously it forgot to reset the session state indicating
whether a non-exclusive backup is in progress or not in this session.

This issue could cause an assertion failure when the session running
BASE_BACKUP is terminated after it executed pg_backup_start and
pg_backup_stop (pg_stop_backup in v14 or before). Also it could cause
a segmentation fault when pg_backup_stop is called after BASE_BACKUP
in the same session is canceled.

This commit fixes the issue by making do_pg_abort_backup reset
that session state.

Back-patch to all supported branches.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com
2022-07-20 09:57:01 +09:00
Peter Eisentraut 9fd45870c1 Replace many MemSet calls with struct initialization
This replaces all MemSet() calls with struct initialization where that
is easily and obviously possible.  (For example, some cases have to
worry about padding bits, so I left those.)

(The same could be done with appropriate memset() calls, but this
patch is part of an effort to phase out MemSet(), so it doesn't touch
memset() calls.)

Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com
2022-07-16 08:50:49 +02:00
Fujii Masao 62c46eee22 Add checkpoint and REDO LSN to log_checkpoints message.
It is useful for debugging purposes to report the checkpoint LSN and
REDO LSN in log_checkpoints message. It can give more context while
analyzing checkpoint-related issues. pg_controldata reports the last
checkpoint LSN and REDO LSN, but having this information alongside
the log message helps analyze issues that happened previously,
connect the dots and identify the root cause.

Author: Bharath Rupireddy, Kyotaro Horiguchi
Reviewed-by: Michael Paquier, Julien Rouhaud, Nathan Bossart, Fujii Masao, Greg Stark
Discussion: https://postgr.es/m/CALj2ACWt6kqriAHrO+AJj+OmP=suwbktHT5JoYAn-nqZe2gd2g@mail.gmail.com
2022-07-07 22:37:54 +09:00
David Rowley 0229106afa Overload index_form_tuple to allow the memory context to be supplied
40af10b57 changed things so we make use of a generation memory context for
storing tuples to be sorted by tuplesort.c. That change does not play
nicely with the changes made in 9f03ca915 (back in 2014). That commit
changed things so that index_form_tuple() is called while switched into
the tuplestore's tuplecontext. In order to fetch the tuple from the index,
index_form_tuple() must do various memory allocations which are unrelated
to the storage of the final returned tuple. Although all of these
allocations are pfree'd, the fact that we now use a generation context
means that the memory for these pfree'd allocations won't be used again by
any other allocation due to generation.c's lack of freelists.  This could
result in sorts used for building indexes exceeding maintenance_work_mem
by a very large amount.

Here we fix it so we no longer allocate anything apart from the tuple
itself into the generation context by adding a new version of
index_form_tuple named index_form_tuple_context, which can be called to
specify the MemoryContext to allocate the tuple into.

Discussion: https://postgr.es/m/CAApHDvrHQkiFRHiGiAS-LMOvJN-eK-s762=tVzBz8ZqUea-a_A@mail.gmail.com
Backpatch-through: 15, where 40af10b57 was added.
2022-07-07 08:14:00 +12:00
Robert Haas b0a55e4329 Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.

Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".

Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.

On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.

Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.

Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.

Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 11:39:09 -04:00
Peter Eisentraut 16d52fc89d Refactor sending of DataRow messages in replication protocol
Some routines open-coded the construction of DataRow messages.  Use
TupOutputState struct and associated functions instead, which was
already done in some places.

SendTimeLineHistory() is a bit more complicated and isn't converted by
this.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/7e4fdbdc-699c-4cd0-115d-fb78a957fc22@enterprisedb.com
2022-07-06 08:42:56 +02:00
Michael Paquier dac1ff3090 Replace durable_rename_excl() by durable_rename(), take two
durable_rename_excl() attempts to avoid overwriting any existing files
by using link() and unlink(), and it falls back to rename() on some
platforms (aka WIN32), which offers no such overwrite protection.  Most
callers use durable_rename_excl() just in case there is an existing
file, but in practice there shouldn't be one (see below for more
details).

Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file.  As per Nathan's tests, it is
possible to end up with two links to the same file in pg_wal after a
crash just before unlink() during WAL recycling.  Specifically, the test
produced links to the same file for the current WAL file and the next
one because the half-recycled WAL file was re-recycled upon restarting,
leading to WAL corruption.

This change replaces all the calls of durable_rename_excl() to
durable_rename().  This removes the protection against accidentally
overwriting an existing file, but some platforms are already living
without it and ordinarily there shouldn't be one.  The function itself
is left around in case any extensions are using it.  It will be removed
on HEAD via a follow-up commit.

Here is a summary of the existing callers of durable_rename_excl() (see
second discussion link at the bottom), replaced by this commit.  First,
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms.  Second, xlog.c uses it to recycle
past WAL segments, where an overwrite should not happen (origin of the
change at f0e37a8) because there are protections about the WAL segment
to select when recycling an entry.  The third and last area is related
to the write of timeline history files.  writeTimeLineHistory() will
write a new timeline history file at the end of recovery on promotion,
so there should be no such files for the same timeline.
What remains is writeTimeLineHistoryFile(), that can be used in parallel
by a WAL receiver and the startup process, and some digging of the
buildfarm shows that EEXIST from a WAL receiver can happen with an error
of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\",
which would cause an automatic restart of the WAL receiver as it is
promoted to FATAL, hence this should improve the stability of the WAL
receiver as rename() would overwrite an existing TLI history file
already fetched by the startup process at recovery.

This is a bug fix, but knowing the unlikeliness of the problem involving
one or more crashes at an exceptionally bad moment, no backpatch is
done.  Also, I want to be careful with such changes (aaa3aed did the
opposite of this change by removing HAVE_WORKING_LINK so as Windows
would do a link() rather than a rename() but this was not
concurrent-safe).  A backpatch could be revisited in the future.  This
is the second time this change is attempted, ccfbd92 being the first
one, but this time no assertions are added for the case of a TLI history
file written concurrently by the WAL receiver or the startup process
because we can expect one to exist (some of the TAP tests are able to
trigger with a proper timing).

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz
2022-07-05 10:16:12 +09:00
Peter Eisentraut 2ce648f750 Refactor sending of RowDescription messages in replication protocol
Some routines open-coded the construction of RowDescription messages.
Instead, we have support for doing this using tuple descriptors and
DestRemoteSimple, so use that instead.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/7e4fdbdc-699c-4cd0-115d-fb78a957fc22@enterprisedb.com
2022-07-04 19:43:58 +02:00