Commit Graph

25412 Commits

Author SHA1 Message Date
Robert Haas d3ae2a24f2 Add allow_alter_system GUC.
This is marked PGC_SIGHUP, so it can only be set in a configuration
file, not anywhere else; and it is also marked GUC_DISALLOW_IN_AUTO_FILE,
so it can't be set using ALTER SYSTEM. When set to false, the
ALTER SYSTEM command is disallowed.

There was considerable concern that this would be misinterpreted as
a security feature, which it is not, because a determined superuser
has various ways of bypassing it. Hence, a lot of work has gone into
wordsmithing the documentation, in the hopes of avoiding any such
confusion.

Jelte Fennemia-Nio and Gabriele Bartolini, with wording suggestions
for the documentation from many others.

Discussion: http://postgr.es/m/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com
2024-03-29 08:45:11 -04:00
Tom Lane 0075d78947 Allow "internal" subtransactions in parallel mode.
Allow use of BeginInternalSubTransaction() in parallel mode, so long
as the subtransaction doesn't attempt to acquire an XID or increment
the command counter.  Given those restrictions, the other parallel
processes don't need to know about the subtransaction at all, so
this should be safe.  The benefit is that it allows subtransactions
intended for error recovery, such as pl/pgsql exception blocks,
to be used in PARALLEL SAFE functions.

Another reason for doing this is that the API of
BeginInternalSubTransaction() doesn't allow reporting failure.
pl/python for one, and perhaps other PLs, copes very poorly with an
error longjmp out of BeginInternalSubTransaction().  The headline
feature of this patch removes the only easily-triggerable failure
case within that function.  There remain some resource-exhaustion
and similar cases, which we now deal with by promoting them to FATAL
errors, so that callers need not try to clean up.  (It is likely
that such errors would leave us with corrupted transaction state
inside xact.c, making recovery difficult if not impossible anyway.)

Although this work started because of a report of a pl/python crash,
we're not going to do anything about that in the back branches.
Back-patching this particular fix is obviously not very wise.
While we could contemplate some narrower band-aid, pl/python is
already an untrusted language, so it seems okay to classify this
as a "so don't do that" case.

Patch by me, per report from Hao Zhang.  Thanks to Robert Haas for
review.

Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com
2024-03-28 12:43:10 -04:00
Alvaro Herrera e2395cdbe8
ALTER TABLE: rework determination of access method ID
Avoid setting an access method OID for relation kinds that don't take
one.  Code review for new feature added in 374c7a2290.

Author: Justin Pryzby <pryzby@telsasoft.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/e5516ac1-5264-c3c0-d822-9e6f614ea93b@gmail.com
2024-03-28 16:51:20 +01:00
Tom Lane be98a550cc Update comment in set_dummy_rel_pathlist().
This comment claimed that set_dummy_rel_pathlist() has callers
other than (possibly indirectly) set_rel_size().  It doesn't,
so revise the argument to not rely on that.

Noted by Richard Guo.

Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
2024-03-28 11:44:00 -04:00
Heikki Linnakangas 427005742b Remove obsolete comment about VACUUM retrying pruning
Commit 1ccc1e05ae removed the retry logic that the comment talked
about.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240328015326.x5gnzsohl6j23b42@liskov
2024-03-28 10:18:48 +02:00
Masahiko Sawada 2d8f56dabb Rethink create and attach APIs of shared TidStore.
Previously, the behavior of TidStoreCreate() was inconsistent between
local and shared TidStore instances in terms of memory limitation. For
local TidStore, a memory context was created with initial and maximum
memory block sizes, as well as a minimum memory context size, based on
the specified max_bytes values. However, for shared TidStore, the
provided DSA area was used for TID storage. Although commit bb952c8c8b
allowed specifying the initial and maximum DSA segment sizes, callers
would have needed to clamp their own limits, which was not consistent
and user-friendly.

With this commit, when creating a shared TidStore, a dedicated DSA
area is created for TID storage instead of using a provided DSA
area. The initial and maximum DSA segment sizes are chosen based on
the specified max_bytes. Other processes can attach to the shared
TidStore using the handle of the created DSA returned by the new
TidStoreGetDSA() function and the DSA pointer returned by
TidStoreGetHandle(). The created DSA has the same lifetime as the
shared TidStore and is deleted when all processes detach from it.

To improve clarity, the TidStoreCreate() function has been divided
into two separate functions: TidStoreCreateLocal() and
TidStoreCreateShared().

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAD21AoAyc1j%3DBCdUqZfk6qbdjZ68UgRx1Gkpk0oah4K7S0Ri9g%40mail.gmail.com
2024-03-28 10:03:28 +09:00
Tom Lane a767cdc84c Fix unnecessary use of moving-aggregate mode with non-moving frame.
When a plain aggregate is used as a window function, and the window
frame start is specified as UNBOUNDED PRECEDING, the frame's head
cannot move so we do not need to use moving-aggregate mode.  The check
for that was put into initialize_peragg(), failing to notice that
ExecInitWindowAgg() calls that function before it's filled in
winstate->frameOptions.  Since makeNode() would have zeroed the field,
this didn't provoke uninitialized-value complaints, nor would the
erroneous decision have resulted in more than a little inefficiency.
Still, it's wrong, so move the initialization of
winstate->frameOptions earlier to make it work properly.

While here, also fix a thinko in a comment.  Both errors crept in in
commit a9d9acbf2 which introduced the moving-aggregate mode.

Spotted by Vallimaharajan G.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com
2024-03-27 13:39:03 -04:00
Robert Haas de7e96bd0f Rename COMPAT_OPTIONS_CLIENT to COMPAT_OPTIONS_OTHER.
The user-facing name is "Other Platforms and Clients", but the
internal name seems too focused on clients specifically, especially
given the plan to add a new setting to this session that is about
platform or deployment model compatibility rather than client
compatibility.

Jelte Fennema-Nio

Discussion: http://postgr.es/m/CAGECzQTfMbDiM6W3av+3weSnHxJvPmuTEcjxVvSt91sQBdOxuQ@mail.gmail.com
2024-03-27 10:45:28 -04:00
Dean Rasheed e6341323a8 Add functions to generate random numbers in a specified range.
This adds 3 new variants of the random() function:

    random(min integer, max integer) returns integer
    random(min bigint, max bigint) returns bigint
    random(min numeric, max numeric) returns numeric

Each returns a random number x in the range min <= x <= max.

For the numeric function, the number of digits after the decimal point
is equal to the number of digits that "min" or "max" has after the
decimal point, whichever has more.

The main entry points for these functions are in a new C source file.
The existing random(), random_normal(), and setseed() functions are
moved there too, so that they can all share the same PRNG state, which
is kept private to that file.

Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev,
and Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCV89Vxuq93xQdmc0t-0Y2zeeNQTdsjbmV7dyFBPykbV4Q@mail.gmail.com
2024-03-27 10:12:39 +00:00
Alexander Korotkov 818861eb57 Fix some typos and grammar issues from commit 87985cc925
Reported-by: Alexander Lakhin
2024-03-27 11:47:41 +02:00
Amit Kapila 6d49c8d4b4 Change last_inactive_time to inactive_since in pg_replication_slots.
Commit a11f330b55 added last_inactive_time to show the last time the slot
was inactive. But, it tells the last time that a currently-inactive slot
previously *WAS* active. This could be unclear, so we changed the name to
inactive_since.

Reported-by: Robert Haas
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Shveta Malik, Amit Kapila
Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACUXS0SfbHzsX8bqo+7CZhocsV52Kiu7OWGb5HVPAmJqnA@mail.gmail.com
2024-03-27 09:27:44 +05:30
Masahiko Sawada bb952c8c8b Allow specifying initial and maximum segment sizes for DSA.
Previously, the DSA segment size always started with 1MB and grew up
to DSA_MAX_SEGMENT_SIZE. It was inconvenient in certain scenarios,
such as when the caller desired a soft constraint on the total DSA
segment size, limiting it to less than 1MB.

This commit introduces the capability to specify the initial and
maximum DSA segment sizes when creating a DSA area, providing more
flexibility and control over memory usage.

Reviewed-by: John Naylor, Tomas Vondra
Discussion: https://postgr.es/m/CAD21AoAYGGC1ePjVX0H%2Bpp9rH%3D9vuPK19nNOiu12NprdV5TVJA%40mail.gmail.com
2024-03-27 11:43:29 +09:00
Tom Lane 9d00cf4772 Remove some redundant set_cheapest() calls.
Commit e2fa76d80 centralized the responsibility for doing
set_cheapest() for a baserel, but these functions added later
seemingly didn't get the memo.  There's no apparent reason why
we need the cheapest path for these relation types to be available
any sooner than it is for other base relation types, so delete the
duplicate calls.  Doesn't save much since there's only one path
in these cases, but it might improve clarity.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
2024-03-26 16:02:44 -04:00
Nathan Bossart d365ae7054 Optimize roles_is_member_of() with a Bloom filter.
When the list of roles gathered by roles_is_member_of() grows very
large, a Bloom filter is created to help avoid some linear searches
through the list.  The threshold for creating the Bloom filter is
set arbitrarily high and may require future adjustment.

Suggested-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com
2024-03-26 14:43:37 -05:00
Tom Lane fad3b5b5ac Fix failure of ALTER FOREIGN TABLE SET SCHEMA to move sequences.
Ordinary ALTER TABLE SET SCHEMA will also move any owned sequences
into the new schema.  We failed to do likewise for foreign tables,
because AlterTableNamespaceInternal believed that only certain
relkinds could have indexes, owned sequences, or constraints.
We could simply add foreign tables to that relkind list, but it
seems likely that the same oversight could be made again in
future.  Instead let's remove the relkind filter altogether.
These functions shouldn't cost much when there are no objects
that they need to process, and surely this isn't an especially
performance-critical case anyway.

Per bug #18407 from Vidushi Gupta.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18407-4fd07373d252c6a0@postgresql.org
2024-03-26 15:28:31 -04:00
Tom Lane a65724dfa7 Propagate pathkeys from CTEs up to the outer query.
If we know the sort order of a CTE's output, and it is relevant
to the outer query, label the CTE's outer-query access path using
those pathkeys.  This may enable optimizations such as avoiding
a sort in the outer query.

The code for hoisting pathkeys into the outer query already exists
for regular RTE_SUBQUERY subqueries, but it wasn't getting used for
CTEs, possibly out of concern for maintaining an optimization fence
between the CTE and the outer query.  However, on the same arguments
used for commit f7816aec2, there seems no harm in letting the outer
query know what the inner query decided to do.

In support of this, we now remember the best Path as well as Plan
for each subquery for the rest of the planner run.  There may be
future applications for having that at hand, and it surely costs
little to build one more List.

Richard Guo (minor mods by me)

Discussion: https://postgr.es/m/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew@mail.gmail.com
2024-03-26 13:05:51 -04:00
Bruce Momjian e648e77e25 C comment: mention no doc for negative start of substring(text)
Also add URL to hackers discussion.

Backpatch-through: master
2024-03-26 12:27:41 -04:00
Tom Lane 8a92b70c11 Allow "make check"-style testing to work with musl C library.
The musl dynamic linker saves a pointer to the process' environment
value of LD_LIBRARY_PATH very early in startup.  When we move/clobber
the environment to make more room for ps status strings, we clobber
that value and thereby prevent libraries from being found via
LD_LIBRARY_PATH, which breaks the use of a temporary installation
for testing purposes.  To fix, stop collecting usable space for
ps status if we notice that the variable we are about to clobber
is LD_LIBRARY_PATH.  This will result in some reduction in how long
the ps status can be, but it's only likely to occur in temporary
test contexts, so it doesn't seem like a big problem.  In any case,
we don't have to do it if we see we are on glibc, which surely is
where the majority of our Linux testing is done.

Thomas Munro, Bruce Momjian, and Tom Lane, per report from Wolfgang
Walther.  Back-patch to all supported branches, with the hope that
we'll set up a buildfarm animal to test on this platform.

Discussion: https://postgr.es/m/fddd1cd6-dc16-40a2-9eb5-d7fef2101488@technowledgy.de
2024-03-26 11:44:49 -04:00
Peter Eisentraut 89e5ef7e21 Remove ObjectClass type
ObjectClass is an enum whose values correspond to catalog OIDs.  But
the extra layer of redirection, which is used only in small parts of
the code, and the similarity to ObjectType, are confusing and
cumbersome.

One advantage has been that some switches processing the OCLASS enum
don't have "default:" cases.  This is so that the compiler tells us
when we fail to add support for some new object class.  But you can
also handle that with some assertions and proper test coverage.  It's
not even clear how strong this benefit is.  For example, in
AlterObjectNamespace_oid(), you could still put a new OCLASS into the
"ignore object types that don't have schema-qualified names" case, and
it might or might not be wrong.  Also, there are already various
OCLASS switches that do have a default case, so it's not even clear
what the preferred coding style should be.

Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw%40mail.gmail.com
2024-03-26 10:08:56 +01:00
Masahiko Sawada 4edb37e322 Fix a calculation in TidStoreCreate().
Since we expect that the max_bytes is in bytes, not in kilobytes, it
should not be multiplied by 1024.

Introduced by 30e144287a.

Reported-by: John Naylor, David Rowley
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CANWCAZZTE-14ofsucofTuhFsfuDGBNf%3DNZb22TMYT8bxA41oQQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvojg82NDaDEpj1WEZSbVTafj%3DDRmW%2BFrkBdW8ScL4OFxA%40mail.gmail.com
2024-03-26 13:06:06 +09:00
Alexander Korotkov 41d3780d3d Improve error message for tts_(virtual|minimal)_is_current_xact_tuple
Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com
Author: Pavel Borisov
2024-03-26 01:55:22 +02:00
Alexander Korotkov 10baee0c95 Add comments on some MinimalTupleSlots methods usage
Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com
Author: Pavel Borisov
2024-03-26 01:53:34 +02:00
Alexander Korotkov 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()
Currently, in read committed transaction isolation mode (default), we have the
following sequence of actions when tuple_update()/tuple_delete() finds
the tuple updated by the concurrent transaction.

1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which
   returns TM_Updated.
2. Lock tuple with tuple_lock().
3. Re-evaluate plan qual (recheck if we still need to update/delete and
   calculate the new tuple for update).
4. Second attempt to update/delete tuple with tuple_update()/tuple_delete().
   This attempt should be successful, since the tuple was previously locked.

This commit eliminates step 2 by taking the lock during the first
tuple_update()/tuple_delete() call.  The heap table access method saves some
effort by checking the updated tuple once instead of twice.  Future
undo-based table access methods, which will start from the latest row version,
can immediately place a lock there.

Also, this commit makes tuple_update()/tuple_delete() optionally save the old
tuple into the dedicated slot.  That saves efforts on re-fetching tuples in
certain cases.

The code in nodeModifyTable.c is simplified by removing the nested switch/case.

Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
Reviewed-by: Andres Freund, Chris Travers
2024-03-26 01:27:56 +02:00
Tom Lane c7076ba6ad Refactor predicate_{implied,refuted}_by_simple_clause.
Put the node-type-dependent operations into switches on nodeTag.
This should ease addition of new proof rules for other expression
node types.  There is no functional change, although some tests
are made in a different order than before.

Also, add a couple of new cross-checks in test_predtest.c.

James Coleman (part of a larger patch series)

Discussion: https://postgr.es/m/CAAaqYe8Bo4bf_i6qKj8KBsmHMYXhe3Xt6vOe3OBQnOaf3_XBWg@mail.gmail.com
2024-03-25 17:45:15 -04:00
Jeff Davis bc5fcaa289 Clarify comment for LogicalTapeSetBlocks().
Discussion: https://postgr.es/m/1229327.1711160246@sss.pgh.pa.us
Backpatch-through: 13
2024-03-25 11:51:44 -07:00
Alvaro Herrera 374c7a2290
Allow specifying an access method for partitioned tables
It's now possible to specify a table access method via
CREATE TABLE ... USING for a partitioned table, as well change it with
ALTER TABLE ... SET ACCESS METHOD.  Specifying an AM for a partitioned
table lets the value be used for all future partitions created under it,
closely mirroring the behavior of the TABLESPACE option for partitioned
tables.  Existing partitions are not modified.

For a partitioned table with no AM specified, any new partitions are
created with the default_table_access_method.

Also add ALTER TABLE ... SET ACCESS METHOD DEFAULT, which reverts to the
original state of using the default for new partitions.

The relcache of partitioned tables is not changed: rd_tableam is not
set, even if a partitioned table has a relam set.

Author: Justin Pryzby <pryzby@telsasoft.com>
Author: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Author: Michaël Paquier <michael@paquier.xyz>
Reviewed-by: The authors themselves
Discussion: https://postgr.es/m/CAE-ML+9zM4wJCGCBGv01k96qQ3gFv4WFcFy=zqPHKeaEFwwv6A@mail.gmail.com
Discussion: https://postgr.es/m/20210308010707.GA29832%40telsasoft.com
2024-03-25 16:30:36 +01:00
Daniel Gustafsson 64e401b62b Fix indentation from a11f330b5
Per buildfarm animal koel
2024-03-25 14:18:33 +01:00
Heikki Linnakangas f83d709760 Merge prune, freeze and vacuum WAL record formats
The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation for changing VACUUM to write a
combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.

This reserves separate XLOG_HEAP2_* info codes for when the pruning
record is emitted for on-access pruning or VACUUM, per Peter
Geoghegan's suggestion. The record format is identical, but having
separate info codes makes it easier analyze pruning and vacuuming with
pg_waldump.

The function to emit the new WAL record, log_heap_prune_and_freeze(),
is in pruneheap.c. The existing heap_log_freeze_plan() and its
subroutines are moved to pruneheap.c without changes, to keep them
together with log_heap_prune_and_freeze().

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com
2024-03-25 14:59:58 +02:00
Amit Kapila a11f330b55 Track last_inactive_time in pg_replication_slots.
This commit adds a new property called last_inactive_time for slots. It is
set to 0 whenever a slot is made active/acquired and set to the current
timestamp whenever the slot is inactive/released or restored from the disk.
Note that we don't set the last_inactive_time for the slots currently being
synced from the primary to the standby because such slots are typically
inactive as decoding is not allowed on those.

The 'last_inactive_time' will be useful on production servers to debug and
analyze inactive replication slots. It will also help to know the lifetime
of a replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down.

The 'last_inactive_time' will also be useful to implement inactive
timeout-based replication slot invalidation in a future commit.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
2024-03-25 16:34:33 +05:30
Amit Langote 0f7863afef Code review for 6190d828cd
* Fix the comment of init_dummy_sjinfo() to remove references to
  non-existing parameters 'rel1' and 'rel2'.

* Adjust consider_new_or_clause() to call init_dummy_sjinfo() to make
  up a SpecialJoinInfo for inner joins like other code sites that
  were adjusted in 6190d828cd to do so.

Author: Richard Guo <guofenglinux@gmail.com>
Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com
2024-03-25 19:45:27 +09:00
Amit Langote 6190d828cd Do not translate dummy SpecialJoinInfos for child joins
This teaches build_child_join_sjinfo() to create the dummy
SpecialJoinInfos (those created for inner joins) directly for a given
child join, skipping the unnecessary overhead of translating the
parent joinrel's SpecialJoinInfo.

To that end, this commit moves the code to initialize the dummy
SpecialJoinInfos to a new function named init_dummy_sjinfo() and
changes the few existing sites that have this code and
build_child_join_sjinfo() to call this new function.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com
2024-03-25 18:06:47 +09:00
Amit Langote 5278d0a2e8 Reduce memory used by partitionwise joins
Specifically, this commit reduces the memory consumed by the
SpecialJoinInfos that are allocated for child joins in
try_partitionwise_join() by freeing them at the end of creating paths
for each child join.

A SpecialJoinInfo allocated for a given child join is a copy of the
parent join's SpecialJoinInfo, which contains the translated copies
of the various Relids bitmapsets and semi_rhs_exprs, which is a List
of Nodes.  The newly added freeing step frees the struct itself and
the various bitmapsets, but not semi_rhs_exprs, because there's no
handy function to free the memory of Node trees.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com
2024-03-25 18:06:46 +09:00
David Rowley 66c0185a3d Allow planner to use Merge Append to efficiently implement UNION
Until now, UNION queries have often been suboptimal as the planner has
only ever considered using an Append node and making the results unique
by either using a Hash Aggregate, or by Sorting the entire Append result
and running it through the Unique operator.  Both of these methods
always require reading all rows from the union subqueries.

Here we adjust the union planner so that it can request that each subquery
produce results in target list order so that these can be Merge Appended
together and made unique with a Unique node.  This can improve performance
significantly as the union child can make use of the likes of btree
indexes and/or Merge Joins to provide the top-level UNION with presorted
input.  This is especially good if the top-level UNION contains a LIMIT
node that limits the output rows to a small subset of the unioned rows as
cheap startup plans can be used.

Author: David Rowley
Reviewed-by: Richard Guo, Andy Fan
Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
2024-03-25 14:31:14 +13:00
Tom Lane af1d395843 Allow more cases to pass the unsafe-use-of-new-enum-value restriction.
Up to now we've rejected cases like

BEGIN;
CREATE TYPE rainbow AS ENUM ();
ALTER TYPE rainbow ADD VALUE 'red';
-- use the value 'red', perhaps in a constraint or index
COMMIT;

The concern is that the uncommitted enum value 'red' might get into
an index and then break the index if we roll back the ALTER ADD.
If the ALTER is in the same transaction as the CREATE then it's really
perfectly safe, but we weren't taking the trouble to identify that.

pg_dump in binary-upgrade mode will emit enum definitions that look
like the above, which up to now didn't fall foul of the unsafe-usage
check because we processed each restore command as a separate
transaction.  However an upcoming patch proposes to bundle the restore
commands into large transactions to reduce XID consumption during
pg_upgrade, and that makes this behavior a problem.

To fix, remember the OIDs of enum types created in the current
transaction, and allow use of enum values that are added to one later
in the same transaction.  To do this fully correctly in the presence
of subtransactions, we'd have to track subtransaction nesting level of
the CREATE and do maintenance work at every subsequent subtransaction
exit.  That seems expensive, and we don't need it to satisfy pg_dump's
usage.  Hence, apply the additional optimization only when the CREATE
and ALTER are at outermost transaction level.

Patch by me, reviewed by Andrew Dunstan

Discussion: https://postgr.es/m/1548468.1711220438@sss.pgh.pa.us
2024-03-24 14:30:29 -04:00
Peter Eisentraut 34768ee361 Add temporal FOREIGN KEY contraints
Add PERIOD clause to foreign key constraint definitions.  This is
supported for range and multirange types.  Temporal foreign keys check
for range containment instead of equality.

This feature matches the behavior of the SQL standard temporal foreign
keys, but it works on PostgreSQL's native ranges instead of SQL's
"periods", which don't exist in PostgreSQL (yet).

Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT}
are not supported yet.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-03-24 07:37:13 +01:00
Daniel Gustafsson 697f8d266c Revert "Add notBefore and notAfter to SSL cert info display"
This reverts commit 6acb0a628e since
LibreSSL didn't support ASN1_TIME_diff until OpenBSD 7.1, leaving
the older OpenBSD animals in the buildfarm complaining.

Per plover in the buildfarm.

Discussion: https://postgr.es/m/F0DF7102-192D-4C21-96AE-9A01AE153AD1@yesql.se
2024-03-22 22:58:41 +01:00
Tom Lane 473182c952 Use a hash table for catcache.c's CatCList objects.
Up to now, all of the "catcache list" objects within a catalog cache
were just chained together on a single dlist, requiring O(N) time to
search.  Remarkably, we've not had serious performance problems with
that so far; but we got a complaint of a bad performance regression
from v15 in a case with a large number of roles in the system, which
traced down to O(N^2) total time when we probed N catcache lists.

Replace that data structure with a hashtable having an enlargeable
number of dlists, in an exactly parallel way to the data structure
we've used for years for the plain CatCTup cache members.  The extra
cost of maintaining a hash table seems negligible, since we were
already computing a hash value for list searches.

Normally this'd be HEAD-only material, but in view of the performance
regression it seems advisable to back-patch into v16.  In the v16
version of the patch, leave the dead cc_lists field where it is and
add the new fields at the end of struct catcache, to avoid possible
ABI breakage in case any external code is looking at these structs.
(We assume no external code is actually allocating new catcache
structs.)

Per report from alex work.

Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz=vggErdSG7pv2s6vmmTOLJSg@mail.gmail.com
2024-03-22 17:13:53 -04:00
Daniel Gustafsson 6acb0a628e Add notBefore and notAfter to SSL cert info display
This adds the X509 attributes notBefore and notAfter to sslinfo
as well as pg_stat_ssl to allow verifying and identifying the
validity period of the current client certificate. OpenSSL has
APIs for extracting notAfter and notBefore, but they are only
supported in recent versions so we have to calculate the dates
by hand in order to make this work for the older versions of
OpenSSL that we still support.

Original patch by Cary Huang with additional hacking by Jacob
and myself.

Author: Cary Huang <cary.huang@highgo.ca>
Co-author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca
2024-03-22 21:25:25 +01:00
Alexander Korotkov b670b93a66 Fix an oversight in refactoring in 06b10f80ba.
It was against intended skipping prechecking keys optimization in the
first page of range queries to not influence point queries performance.

Reported-by: Anton Melnikov
Discussion: https://postgr.es/m/30cd7524-b9f1-4cf8-9c4a-223eb2e34441%40postgrespro.ru
Author: Pavel Borisov
2024-03-22 15:25:53 +02:00
Peter Eisentraut d20d8fbd3e Do not output actual value of location fields in node serialization by default
This changes nodeToString() to not output the actual value of location
fields in nodes, but instead it writes -1.  This mirrors the fact that
stringToNode() also does not read location field values but always
stores -1.

For most uses of nodeToString(), which is to store nodes in catalog
fields, this is more useful.  We don't store original query texts in
catalogs, so any lingering query location values are not meaningful.

For debugging purposes, there is a new nodeToStringWithLocations(),
which mirrors the existing stringToNodeWithLocations().  This is used
for WRITE_READ_PARSE_PLAN_TREES and nodes/print.c functions, which
covers all the debugging uses.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEze2WgrCiR3JZmWyB0YTc8HV7ewRdx13j0CqD6mVkYAW+SFGQ@mail.gmail.com
2024-03-22 09:49:12 +01:00
Amit Kapila 6ae701b437 Track invalidation_reason in pg_replication_slots.
Till now, the reason for replication slot invalidation is not tracked
directly in pg_replication_slots. A recent commit 007693f2a3 added
'conflict_reason' to show the reasons for slot conflict/invalidation, but
only for logical slots.

This commit adds a new column 'invalidation_reason' to show invalidation
reasons for both physical and logical slots. And, this commit also turns
'conflict_reason' text column to 'conflicting' boolean column (effectively
reverting commit 007693f2a3). The 'conflicting' column is true for
invalidation reasons 'rows_removed' and 'wal_level_insufficient' because
those make the slot conflict with recovery. When 'conflicting' is true,
one can now look at the new 'invalidation_reason' column for the reason
for the logical slot's conflict with recovery.

The new 'invalidation_reason' column will also be useful to track other
invalidation reasons in the future commit.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
2024-03-22 13:52:05 +05:30
Peter Eisentraut b4080fa3dc Make RangeTblEntry dump order consistent
Put the fields alias and eref earlier in the struct, so that it
matches the order in _outRangeTblEntry()/_readRangeTblEntry().  This
helps if we ever want to fully automate out/read of RangeTblEntry.
Also, it makes dumps in the debugger easier to read in the same way.
Internally, this makes no difference.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
2024-03-22 07:28:33 +01:00
Peter Eisentraut 367c989cd8 Remove custom _jumbleRangeTblEntry()
This is part of an effort to reduce the number of special cases in the
automatically generated node support functions.

This patch removes _jumbleRangeTblEntry() and instead adds per-field
query_jumble_ignore annotations to match the behavior of the previous
custom code.  The pg_stat_statements test suite has some coverage of
this.  It gets rid of the switch on rtekind; this should be
technically correct, since we do the equal and copy functions like
this also.

The list of fields to jumble has been checked and is considered
correct as of 8b29a119fd.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
2024-03-22 07:23:47 +01:00
Amit Langote 085e759e9d Avoid splitting errmsg string to span multiple lines
The error message being fixed was added in 6185c9737c.

While at it, add an "a" to the sentence.

Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20240322.095149.895185546948714852.horikyota.ntt%40gmail.com
2024-03-22 12:04:06 +09:00
Alexander Korotkov 0997e0af27 Add TupleTableSlotOps.is_current_xact_tuple() method
This allows us to abstract how/whether table AM uses transaction identifiers.
A custom table AM can use a custom slot, which may not store xmin directly,
but determine the tuple belonging to the current transaction in the other way.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li
2024-03-21 23:00:43 +02:00
Alexander Korotkov c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot
This allows table AM to return a native tuple slot even if
VirtualTupleTableSlot is given as an input.  Native tuple slots have knowledge
about system attributes, which could be accessed in the future.
table_multi_insert() method already can modify the input 'slots' array.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li
2024-03-21 23:00:40 +02:00
Alexander Korotkov 02eb07ea89 Allow table AM to store complex data structures in rd_amcache
The new table AM method free_rd_amcache is responsible for freeing all the
memory related to rd_amcache and setting free_rd_amcache to NULL.  If the new
method is not specified, we still assume rd_amcache to be a single chunk of
memory, which could be just pfree'd.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li
2024-03-21 23:00:34 +02:00
Amit Langote 6185c9737c Add SQL/JSON query functions
This introduces the following SQL/JSON functions for querying JSON
data using jsonpath expressions:

JSON_EXISTS(), which can be used to apply a jsonpath expression to a
JSON value to check if it yields any values.

JSON_QUERY(), which can be used to to apply a jsonpath expression to
a JSON value to get a JSON object, an array, or a string.  There are
various options to control whether multi-value result uses array
wrappers and whether the singleton scalar strings are quoted or not.

JSON_VALUE(), which can be used to apply a jsonpath expression to a
JSON value to return a single scalar value, producing an error if it
multiple values are matched.

Both JSON_VALUE() and JSON_QUERY() functions have options for
handling EMPTY and ERROR conditions, which can be used to specify
the behavior when no values are matched and when an error occurs
during jsonpath evaluation, respectively.

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Jian He <jian.universality@gmail.com>

Reviewers have included (in no particular order):

Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He, Anton A. Melnikov,
Nikita Malakhov, Peter Eisentraut, Tomas Vondra

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2024-03-21 17:07:03 +09:00
Masahiko Sawada 30e144287a Add TIDStore, to store sets of TIDs (ItemPointerData) efficiently.
TIDStore is a data structure designed to efficiently store large sets
of TIDs. For TID storage, it employs a radix tree, where the key is
a block number, and the value is a bitmap representing offset
numbers. The TIDStore can be created on a DSA area and used by
multiple backend processes simultaneously.

There are potential future users such as tidbitmap.c, though it's very
likely the interface will need to evolve as we come to understand the
needs of different kinds of users. For example, we can support
updating the offset bitmap of existing values.

Currently, the TIDStore is not used for anything yet, aside from the
test code. But an upcoming patch will use it.

This includes a unit test module, in src/test/modules/test_tidstore.

Co-authored-by: John Naylor
Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-03-21 10:08:42 +09:00
Tom Lane 995e0fbc1c Un-break genbki.pl's error reporting capabilities.
This essentially reverts commit 69eb643b2, which added a fast path
in Catalog::ParseData, but neglected to preserve the behavior of
adding a line_number field in each hash.  That makes it impossible
for genbki.pl to provide any localization of error reports, which is
bad enough; but actually the affected error reports failed entirely,
producing useless bleats like "use of undefined value in sprintf".

69eb643b2 claimed to get a 15% speedup, but I'm not sure I believe
that: the time to rebuild the bki files changes by less than 1% for
me.  In any case, making debugging of mistakes in .dat files more
difficult would not be justified by even an order of magnitude
speedup here; it's just not that big a chunk of the total build time.

Per report from David Wheeler.  Although it's also broken in v16,
I don't think this is worth a back-patch, since we're very unlikely
to touch the v16 catalog data again.

Discussion: https://postgr.es/m/19238.1710953049@sss.pgh.pa.us
2024-03-20 18:02:50 -04:00
Tom Lane 1218ca9956 Add to_regtypemod function to extract typemod from a string type name.
In combination with to_regtype, this allows converting a string to
the "canonicalized" form emitted by format_type.  That usage requires
parsing the string twice, which is slightly annoying but not really
too expensive.  We considered alternatives such as returning a record
type, but that way was notationally uglier than this, and possibly
less flexible.

Like to_regtype(), we'd rather that this return NULL for any bad
input, but the underlying type-parsing logic isn't yet capable of
not throwing syntax errors.  Adjust the documentation for both
functions to point that out.

In passing, fix up a couple of nearby entries in the System Catalog
Information Functions table that had not gotten the word about our
since-v13 convention for displaying function usage examples.

David Wheeler and Erik Wienhold, reviewed by Pavel Stehule, Jim Jones,
and others.

Discussion: https://postgr.es/m/DF2324CA-2673-4ABE-B382-26B5770B6AA3@justatheory.com
2024-03-20 17:11:28 -04:00
Nathan Bossart 80686761c4 Avoid overflow in MaybeRemoveOldWalSummaries().
This commit limits the maximum value of wal_summary_keep_time to
INT_MAX / SECS_PER_MINUTE to avoid overflow when it is converted to
seconds.  In passing, use the HOURS_PER_DAY, MINS_PER_HOUR, and
SECS_PER_MINUTE macros in the code for this GUC instead of hard-
coding those values.

Discussion: https://postgr.es/m/20240314210010.GA3056455%40nathanxps13
2024-03-20 13:31:58 -05:00
Nathan Bossart 2b520860c0 Revert "Temporary patch to help debug pg_walsummary test failures."
Thanks to commits ea18eb7d62, b6ee30ec08, and 19a829a327, the
002_blocks.pl test now consistently passes, so we can remove this
temporary debugging code.

This reverts commit 5ddf997347.

Discussion: https://postgr.es/m/20240314210010.GA3056455%40nathanxps13
2024-03-20 11:34:00 -05:00
Alvaro Herrera a0390f6ca6
Review wording on tablespaces w.r.t. partitioned tables
Remove a redundant comment, and document pg_class.reltablespace properly
in catalogs.sgml.

After commits a36c84c3e4, 87259588d0 and others.

Backpatch to 12.

Discussion: https://postgr.es/m/202403191013.w2kr7wqlamqz@alvherre.pgsql
2024-03-20 15:28:14 +01:00
Alvaro Herrera da952b415f
Rework lwlocknames.txt to become lwlocklist.h
This way, we can fold the list of lock names to occur in
BuiltinTrancheNames instead of having its own separate array.  This
saves two lines of code in GetLWTrancheName and some space in
BuiltinTrancheNames, as foreseen in commit 74a7306310, as well as
removing the need for a separate lwlocknames.c file.

We still have to build lwlocknames.h using Perl code, which initially I
wanted to avoid, but it gives us the chance to cross-check
wait_event_names.txt.

Discussion: https://postgr.es/m/202401231025.gbv4nnte5fmm@alvherre.pgsql
2024-03-20 11:55:20 +01:00
Peter Eisentraut e5da0fe3c2 Catalog domain not-null constraints
This applies the explicit catalog representation of not-null
constraints introduced by b0e96f3119 for table constraints also to
domain not-null constraints.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8%40eisentraut.org
2024-03-20 10:05:37 +01:00
Heikki Linnakangas c9c260decd Remove unused PruneState member rel
PruneState->rel is no longer being used, so just remove it.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov
2024-03-20 10:13:42 +02:00
Heikki Linnakangas c33084205a Reorganize heap_page_prune() function comment
heap_page_prune()'s function header comment didn't explain the
parameters in the same order they appear in the function. Fix that.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov
2024-03-20 10:13:39 +02:00
Heikki Linnakangas d63d486d6c Remove assertions that some compiler say are tautological
To avoid the compiler warnings:

    launch_backend.c:211:39: warning: comparison of constant 16 with expression of type 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare]
    launch_backend.c:233:39: warning: comparison of constant 16 with expression of type 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare]

The point of the assertions was to fail more explicitly if someone
adds a new BackendType to the end of the enum, but forgets to add it
to the child_process_kinds array. It was a pretty weak assertion to
begin with, because it wouldn't catch if you added a new BackendType
in the middle of the enum. So let's just remove it.

Per buildfarm member ayu and a few others, spotted by Tom Lane.

Discussion: https://www.postgresql.org/message-id/4119680.1710913067@sss.pgh.pa.us
2024-03-20 09:14:51 +02:00
Jeff Davis f69319f2f1 Support C.UTF-8 locale in the new builtin collation provider.
The builtin C.UTF-8 locale has similar semantics to the libc locale of
the same name. That is, code point sort order (fast, memcmp-based)
combined with Unicode semantics for character operations such as
pattern matching, regular expressions, and
LOWER()/INITCAP()/UPPER(). The character semantics are based on
Unicode simple case mappings.

The builtin provider's C.UTF-8 offers several important advantages
over libc:

 * faster sorting -- benefits from additional optimizations such as
   abbreviated keys and varstrfastcmp_c
 * faster case conversion, e.g. LOWER(), at least compared with some
   libc implementations
 * available on all platforms with identical semantics, and the
   semantics are stable, testable, and documentable within a given
   Postgres major version

Being based on memcmp, the builtin C.UTF-8 locale does not offer
natural language sort order. But it is an improvement for most use
cases that might otherwise use libc's "C.UTF-8" locale, as well as
many use cases that use libc's "C" locale.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
2024-03-19 15:24:41 -07:00
Tom Lane fd0398fcb0 Improve EXPLAIN's display of SubPlan nodes and output parameters.
Historically we've printed SubPlan expression nodes as "(SubPlan N)",
which is pretty uninformative.  Trying to reproduce the original SQL
for the subquery is still as impractical as before, and would be
mighty verbose as well.  However, we can still do better than that.
Displaying the "testexpr" when present, and adding a keyword to
indicate the SubLinkType, goes a long way toward showing what's
really going on.

In addition, this patch gets rid of EXPLAIN's use of "$n" to represent
subplan and initplan output Params.  Instead we now print "(SubPlan
N).colX" or "(InitPlan N).colX" to represent the X'th output column
of that subplan.  This eliminates confusion with the use of "$n" to
represent PARAM_EXTERN Params, and it's useful for the first part of
this change because it eliminates needing some other indication of
which subplan is referenced by a SubPlan that has a testexpr.

In passing, this adds simple regression test coverage of the
ROWCOMPARE_SUBLINK code paths, which were entirely unburdened
by testing before.

Tom Lane and Dean Rasheed, reviewed by Aleksander Alekseev.
Thanks to Chantal Keller for raising the question of whether
this area couldn't be improved.

Discussion: https://postgr.es/m/2838538.1705692747@sss.pgh.pa.us
2024-03-19 18:19:24 -04:00
Tom Lane b7e2121ab7 Postpone reparameterization of paths until create_plan().
When considering nestloop paths for individual partitions within
a partitionwise join, if the inner path is parameterized, it is
parameterized by the topmost parent of the outer rel, not the
corresponding outer rel itself.  Therefore, we need to translate the
parameterization so that the inner path is parameterized by the
corresponding outer rel.

Up to now, we did this while generating join paths.  However, that's
problematic because we must also translate some expressions that are
shared across all paths for a relation, such as restriction clauses
(kept in the RelOptInfo and/or IndexOptInfo) and TableSampleClauses
(kept in the RangeTblEntry).  The existing code fails to translate
these at all, leading to wrong answers, odd failures such as
"variable not found in subplan target list", or executor crashes.
But we can't modify them during path generation, because that would
break things if we end up choosing some non-partitioned-join path.

So this patch postpones reparameterization of the inner path until
createplan.c, where it is safe to modify the referenced RangeTblEntry,
RelOptInfo or IndexOptInfo, because we have made a final choice of which
Path to use.  We do still have to check during path generation that
the reparameterization will be possible.  So we introduce a new
function path_is_reparameterizable_by_child() to detect that.

The duplication between path_is_reparameterizable_by_child() and
reparameterize_path_by_child() is a bit annoying, but there seems
no other good answer.  A small benefit is that we can avoid building
useless reparameterized trees in cases where a non-partitioned join
is ultimately chosen.  Also, reparameterize_path_by_child() can now
be allowed to scribble on the input paths, saving a few cycles.

This fix repairs the same problems previously addressed in the
back branches by commits 62f120203 et al.

Richard Guo, reviewed at various times by Ashutosh Bapat, Andrei
Lepikhov, Alena Rybakina, Robert Haas, and myself

Discussion: https://postgr.es/m/CAMbWs496+N=UAjOc=rcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw@mail.gmail.com
2024-03-19 14:51:58 -04:00
Peter Eisentraut 605721f819 gen_node_support.pl: Mark location fields as type alias ParseLoc
Instead of the rather ugly type=int + name ~= location$, we now have a
marker type for offset pointers or sizes that are only relevant when a
query text is included, which decreases the complexity required in
gen_node_support.pl for handling these values.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEze2WgrCiR3JZmWyB0YTc8HV7ewRdx13j0CqD6mVkYAW+SFGQ@mail.gmail.com
2024-03-19 16:56:44 +01:00
Peter Eisentraut 49b579f92d Fix misleading comments
To match code changes in 229fb58d4f.
2024-03-19 10:55:51 +01:00
Peter Eisentraut 794f10f6b9 Add some UUID support functions
Add uuid_extract_timestamp() and uuid_extract_version().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
2024-03-19 09:32:04 +01:00
Jeff Davis 60769c62dc Fix another warning, introduced by 846311051e.
Discussion: https://postgr.es/m/3703896.1710799495@sss.pgh.pa.us
Reported-by: Tom Lane
2024-03-18 15:34:43 -07:00
Jeff Davis 846311051e Address more review comments on commit 2d819a08a1.
Based on comments from Peter Eisentraut.

 * Document CREATE DATABASE ... BUILTIN_LOCALE.
 * Determine required encoding based on locale name for CREATE
   COLLATION. Use -1 for "C" (requires catversion bump).
 * initdb output fixups.
 * Make ctype_is_c a constant true for now.
 * Fixups to ICU 010_create_database.pl test.

Discussion: https://postgr.es/m/4135cf11-206d-40ed-96c0-9363c1232379@eisentraut.org
2024-03-18 11:58:13 -07:00
Jeff Davis 61f352ece9 Fix unreachable code warning from commit 2d819a08a1.
Found by Coverity.

Discussion: https://postgr.es/m/3422201.1710711993@sss.pgh.pa.us
Reported-by: Tom Lane
2024-03-18 09:15:47 -07:00
Heikki Linnakangas 0960ae1967 Fix EXPLAIN Bitmap heap scan to count pages with no visible tuples
Previously, bitmap heap scans only counted lossy and exact pages for
explain when there was at least one visible tuple on the page.

heapam_scan_bitmap_next_block() returned true only if there was a
"valid" page with tuples to be processed. However, the lossy and exact
page counters in EXPLAIN should count the number of pages represented
in a lossy or non-lossy way in the constructed bitmap, regardless of
whether or not the pages ultimately contained visible tuples.

Backpatch to all supported versions.

Author: Melanie Plageman
Discussion: https://www.postgresql.org/message-id/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_bxrXeZ2rCnY8LyeC2Ls88KpjWrQ%2BopUrXDRXdcfwFZGA@mail.gmail.com
2024-03-18 14:03:58 +02:00
Heikki Linnakangas 05c3980e7f Move code for backend startup to separate file
This is code that runs in the backend process after forking, rather
than postmaster. Move it out of postmaster.c for clarity.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-18 11:38:10 +02:00
Heikki Linnakangas aafc05de1b Refactor postmaster child process launching
Introduce new postmaster_child_launch() function that deals with the
differences in EXEC_BACKEND mode.

Refactor the mechanism of passing information from the parent to child
process. Instead of using different command-line arguments when
launching the child process in EXEC_BACKEND mode, pass a
variable-length blob of startup data along with all the global
variables. The contents of that blob depend on the kind of child
process being launched. In !EXEC_BACKEND mode, we use the same blob,
but it's simply inherited from the parent to child process.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-18 11:35:30 +02:00
Heikki Linnakangas f1baed18bc Move some functions from postmaster.c to a new source file
This just moves the functions, with no other changes, to make the next
commits smaller and easier to review. The moved functions are related
to launching postmaster child processes in EXEC_BACKEND mode.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-18 11:35:05 +02:00
Heikki Linnakangas 14cddee9cc Split registration of Win32 deadchild callback to separate function
The next commit will move the internal_forkexec() function to a
different source file, but it makes sense to keep all the code related
to the win32 waitpid() emulation in postmaster.c. Split it off to a
separate function now, to make the next commit more mechanical.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-18 11:35:01 +02:00
Nathan Bossart 949300402b Initialize variables to placate compiler.
Since commit 012460ee93, some compilers have been warning that a
couple of variables may be used uninitialized.  There doesn't
appear to be any actual risk, so let's just initialize these
variables to 0 to silence the compiler warnings.

Discussion: https://postgr.es/m/20240317192927.GA3978212%40nathanxps13
2024-03-17 20:16:15 -05:00
Tom Lane 33f13168cc Mark hash_corrupted() as pg_attribute_noreturn.
Coverity started complaining about this after cc5ef90ed.
The code's not really different from before, but might
as well clarify its intent.
2024-03-17 17:54:45 -04:00
Dean Rasheed c649fa24a4 Add RETURNING support to MERGE.
This allows a RETURNING clause to be appended to a MERGE query, to
return values based on each row inserted, updated, or deleted. As with
plain INSERT, UPDATE, and DELETE commands, the returned values are
based on the new contents of the target table for INSERT and UPDATE
actions, and on its old contents for DELETE actions. Values from the
source relation may also be returned.

As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be
used as the source relation for other operations such as WITH queries
and COPY commands.

Additionally, a special function merge_action() is provided, which
returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action
executed for each row. The merge_action() function can be used
anywhere in the RETURNING list, including in arbitrary expressions and
subqueries, but it is an error to use it anywhere outside of a MERGE
query's RETURNING list.

Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera,
Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut,
and Wolfgang Walther.

Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com
2024-03-17 13:58:59 +00:00
Peter Eisentraut 6a004f1be8 Add attstattarget to FormExtraData_pg_attribute
This allows setting attstattarget when a relation is created.

We make use of this by having index_concurrently_create_copy() copy
over the attstattarget values when the new index is created, instead
of having index_concurrently_swap() fix it up later.

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
2024-03-17 12:38:27 +01:00
Peter Eisentraut d939cb2fd6 Generalize handling of nullable pg_attribute columns in DDL
DDL code uses tuple descriptors to pass around pg_attribute values
during table and index creation.  But tuple descriptors don't include
the variable-length/nullable columns of pg_attribute, so they have to
be handled separately.  Right now, the attoptions field is handled in
a one-off way with a separate argument passed to
InsertPgAttributeTuples().  The other affected fields of pg_attribute
are right now not needed at relation creation time.

The goal of this patch is to generalize this to allow handling
additional variable-length/nullable columns of pg_attribute in a
similar manner.  For that, create a new struct
FormExtraData_pg_attribute, which is to be passed around in parallel
to the tuple descriptor and optionally supplies the additional
columns.  Right now, this struct only contains one field for
attoptions, so no functionality is actually changed by this.

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
2024-03-17 12:30:51 +01:00
Peter Eisentraut 012460ee93 Make stxstattarget nullable
To match attstattarget change (commit 4f622503d6).  The logic inside
CreateStatistics() is clarified a bit compared to that previous patch,
and so here we also update ATExecSetStatistics() to match.

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
2024-03-17 12:26:26 +01:00
Dean Rasheed 33e729c514 Fix EXPLAIN output for subplans in MERGE.
Given a subplan in a MERGE query, EXPLAIN would sometimes fail to
properly display expressions involving Params referencing variables in
other parts of the plan tree.

This would affect subplans outside the topmost join plan node, for
which expansion of Params would go via the top-level ModifyTable plan
node.  The problem was that "inner_tlist" for the ModifyTable node's
deparse_namespace was set to the join node's targetlist, but
"inner_plan" was set to the ModifyTable node itself, rather than the
join node, leading to incorrect results when descending to the
referenced variable.

Fix and backpatch to v15, where MERGE was introduced.

Discussion: https://postgr.es/m/CAEZATCWAv-sZuH%2BwG5xJ-%2BGt7qGNGX8wUQd3XYydMFDKgRB9nw%40mail.gmail.com
2024-03-17 10:17:11 +00:00
Peter Eisentraut 20e58105ba Separate equalRowTypes() from equalTupleDescs()
This introduces a new function equalRowTypes() that is effectively a
subset of equalTupleDescs() but only compares the number of attributes
and attribute name, type, typmod, and collation.  This is enough for
most existing uses of equalTupleDescs(), which are changed to use the
new function.  The only remaining callers of equalTupleDescs() are
those that really want to check the full tuple descriptor as such,
without concern about record or row or record type semantics.

The existing function hashTupleDesc() is renamed to hashRowType(),
because it now corresponds more to equalRowTypes().

The purpose of this change is to be clearer about the semantics of the
equality asked for by each caller.  (At least one caller had a comment
that questioned whether equalTupleDescs() was too restrictive.)  For
example, 4f622503d6 removed attstattarget from the tuple descriptor
structure.  It was not fully clear at the time how this should affect
equalTupleDescs().  Now the answer is clear: By their own definitions,
equalRowTypes() does not care, and equalTupleDescs() just compares
whatever is in the tuple descriptor but does not care why it is in
there.

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/f656d6d9-6660-4518-a006-2f65cafbebd1%40eisentraut.org
2024-03-17 05:58:04 +01:00
Daniel Gustafsson b783186515 Add destroyStringInfo function for cleaning up StringInfos
destroyStringInfo() is a counterpart to makeStringInfo(), freeing a
palloc'd StringInfo and its data. This is a convenience function to
align the StringInfo API with the PQExpBuffer API. Originally added
in the OAuth patchset, it was extracted and committed separately in
order to aid upcoming JSON work.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com
2024-03-16 23:18:28 +01:00
Alexander Korotkov 4c2eda67f5 Fix race condition in transaction timeout TAP tests
The interruption handler within the injection point can get stuck in an
infinite loop while handling transaction timeout. To avoid this situation
we reset the timeout flag before invoking the injection point.

Author: Alexander Korotkov
Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZfPchPC6oNN71X2J%40paquier.xyz
2024-03-15 14:38:22 +02:00
Heikki Linnakangas a3f349c612 Improve log messages referring to background worker processes
"Worker" could also mean autovacuum worker or slot sync worker, so
let's be more explicit.

Per Tristan Partin's suggestion.

Discussion: https://www.postgresql.org/message-id/CZM6WDX5H4QI.NZG1YUCKWLA@neon.tech
2024-03-15 13:14:38 +02:00
Michael Paquier cc5ef90edd Refactor initial hash lookup in dynahash.c
The same pattern is used three times in dynahash.c to retrieve a bucket
number and a hash bucket from a hash value.  This has popped up while
discussing improvements for the type cache, where this piece of
refactoring would become useful.

Note that hash_search_with_hash_value() does not need the bucket number,
just the hash bucket.

Author: Teodor Sigaev
Reviewed-by: Aleksander Alekseev, Michael Paquier
Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1@sigaev.ru
2024-03-15 07:57:17 +09:00
David Rowley 4169850f0b Trim ORDER BY/DISTINCT aggregate pathkeys in gather_grouping_paths
Similar to d8a295389, trim off any PathKeys which are for ORDER BY /
DISTINCT aggregate functions from the PathKey List for the Gather Merge
paths created by gather_grouping_paths().  These additional PathKeys are
not valid to use after grouping has taken place as these PathKeys belong
to columns which are inputs to an aggregate function and, therefore are
unavailable after aggregation.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/cf63174c-8c89-3953-cb49-48f41f74941a@gmail.com
Backpatch-through: 16, where 1349d2790 was added
2024-03-15 11:54:36 +13:00
Daniel Gustafsson 4665cebc8a Login event trigger documentation wordsmithing
Minor wordsmithing on the login trigger documentation and code
comments to improve readability, as well as fixing a few small
incorrect statements in the comments.

Author: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/CAJSLCQ0aMWUh1m6E9YdjeqV61baQ=EhteJX8XOxXg8H_2Lcr0Q@mail.gmail.com
2024-03-14 23:35:35 +01:00
Tom Lane b4a71cf65d Make INSERT-from-multiple-VALUES-rows handle domain target columns.
Commit a3c7a993d fixed some cases involving target columns that are
arrays or composites by applying transformAssignedExpr to the VALUES
entries, and then stripping off any assignment ArrayRefs or
FieldStores that the transformation added.  But I forgot about domains
over arrays or composites :-(.  Such cases would either fail with
surprising complaints about mismatched datatypes, or insert unexpected
coercions that could lead to odd results.  To fix, extend the
stripping logic to get rid of CoerceToDomain if it's atop an ArrayRef
or FieldStore.

While poking at this, I realized that there's a poorly documented and
not-at-all-tested behavior nearby: we coerce each VALUES column to
the domain type separately, and rely on the rewriter to merge those
operations so that the domain constraints are checked only once.
If that merging did not happen, it's entirely possible that we'd get
unexpected domain constraint failures due to checking a
partially-updated container value.  There's no bug there, but while
we're here let's improve the commentary about it and add some test
cases that explicitly exercise that behavior.

Per bug #18393 from Pablo Kharo.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18393-65fedb1a0de9260d@postgresql.org
2024-03-14 14:57:16 -04:00
Nathan Bossart d1162cfda8 Add pg_column_toast_chunk_id().
This function returns the chunk_id of an on-disk TOASTed value.  If
the value is un-TOASTed or not on-disk, it returns NULL.  This is
useful for identifying which values are actually TOASTed and for
investigating "unexpected chunk number" errors.

Bumps catversion.

Author: Yugo Nagata
Reviewed-by: Jian He
Discussion: https://postgr.es/m/20230329105507.d764497456eeac1ca491b5bd%40sraoss.co.jp
2024-03-14 10:58:00 -05:00
Heikki Linnakangas 84c18acaf6 Remove redundant snapshot copying from parallel leader to workers
The parallel query infrastructure copies the leader backend's active
snapshot to the worker processes. But BitmapHeapScan node also had
bespoken code to pass the snapshot from leader to the worker. That was
redundant, so remove it.

The removed code was analogous to the snapshot serialization in
table_parallelscan_initialize(), but that was the wrong role model. A
parallel bitmap heap scan is more like an independent non-parallel
bitmap heap scan in each parallel worker as far as the table AM is
concerned, because the coordination is done in nodeBitmapHeapscan.c,
and the table AM doesn't need to know anything about it.

This relies on the assumption that es_snapshot ==
GetActiveSnapshot(). That's not a new assumption, things would get
weird if you used the QueryDesc's snapshot for visibility checks in
the scans, but the active snapshot for evaluating quals, for
example. This could use some refactoring and cleanup, but for now,
just add some assertions.

Reviewed-by: Dilip Kumar, Robert Haas
Discussion: https://www.postgresql.org/message-id/5f3b9d59-0f43-419d-80ca-6d04c07cf61a@iki.fi
2024-03-14 15:18:10 +02:00
Robert Haas 2346df6fc3 Allow a no-wait lock acquisition to succeed in more cases.
We don't determine the position at which a process waiting for a lock
should insert itself into the wait queue until we reach ProcSleep(),
and we may at that point discover that we must insert ourselves ahead
of everyone who wants a conflicting lock, in which case we obtain the
lock immediately. Up until now, a no-wait lock acquisition would fail
in such cases, erroneously claiming that the lock couldn't be obtained
immediately.  Fix that by trying ProcSleep even in the no-wait case.

No back-patch for now, because I'm treating this as an improvement to
the existing no-wait feature. It could instead be argued that it's a
bug fix, on the theory that there should never be any case whatsoever
where no-wait fails to obtain a lock that would have been obtained
immediately without no-wait, but I'm reluctant to interpret the
semantics of no-wait that strictly.

Robert Haas and Jingxian Li

Discussion: http://postgr.es/m/CA+TgmobCH-kMXGVpb0BB-iNMdtcNkTvcZ4JBxDJows3kYM+GDg@mail.gmail.com
2024-03-14 08:56:06 -04:00
Alexander Korotkov eeefd4280f Add TAP tests for timeouts
This commit adds new tests to verify that transaction_timeout,
idle_session_timeout, and idle_in_transaction_session_timeout work as expected.
We introduce new injection points in before throwing a timeout FATAL error
and check these injection points are reached.

Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin
Reviewed-by: Alexander Korotkov
2024-03-14 13:12:15 +02:00
Alexander Korotkov e85662df44 Fix false reports in pg_visibility
Currently, pg_visibility computes its xid horizon using the
GetOldestNonRemovableTransactionId().  The problem is that this horizon can
sometimes go backward.  That can lead to reporting false errors.

In order to fix that, this commit implements a new function
GetStrictOldestNonRemovableTransactionId().  This function computes the xid
horizon, which would be guaranteed to be newer or equal to any xid horizon
computed before.

We have to do the following to achieve this.

1. Ignore processes xmin's, because they consider connection to other databases
   that were ignored before.
2. Ignore KnownAssignedXids, because they are not database-aware. At the same
   time, the primary could compute its horizons database-aware.
3. Ignore walsender xmin, because it could go backward if some replication
   connections don't use replication slots.

As a result, we're using only currently running xids to compute the horizon.
Surely these would significantly sacrifice accuracy.  But we have to do so to
avoid reporting false errors.

Inspired by earlier patch by Daniel Shelepanov and the following discussion
with Robert Haas and Tom Lane.

Discussion: https://postgr.es/m/1649062270.289865713%40f403.i.mail.ru
Reviewed-by: Alexander Lakhin, Dmitry Koval
2024-03-14 13:12:05 +02:00
Amit Kapila 9c40db3b02 Fix typos in reorderbuffer.c.
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20240314.132817.1496502692848380820.horikyota.ntt@gmail.com
2024-03-14 12:12:55 +05:30
Jeff Davis 2d819a08a1 Introduce "builtin" collation provider.
New provider for collations, like "libc" or "icu", but without any
external dependency.

Initially, the only locale supported by the builtin provider is "C",
which is identical to the libc provider's "C" locale. The libc
provider's "C" locale has always been treated as a special case that
uses an internal implementation, without using libc at all -- so the
new builtin provider uses the same implementation.

The builtin provider's locale is independent of the server environment
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are
set to "en_US", which is impossible with the libc provider.

By offering a new builtin provider, it clarifies that the semantics of
a collation using this provider will never depend on libc, and makes
it easier to document the behavior.

Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
2024-03-13 23:33:44 -07:00
Peter Eisentraut 6ab2e8385d Put genbki.pl output into src/include/catalog/ directly
With the makefile rules, the output of genbki.pl was written to
src/backend/catalog/, and then the header files were linked to
src/include/catalog/.

This changes it so that the output files are written directly to
src/include/catalog/.  This makes the logic simpler, and it also makes
the behavior consistent with the meson build system.  Also, the list
of catalog files is now kept in parallel in
src/include/catalog/{meson.build,Makefile}, while before the makefiles
had it in src/backend/catalog/Makefile.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/21b74bdc-183d-4dd5-9c27-9378d178f459@eisentraut.org
2024-03-14 07:11:21 +01:00
Nathan Bossart ecb0fd3372 Reintroduce MAINTAIN privilege and pg_maintain predefined role.
Roles with MAINTAIN on a relation may run VACUUM, ANALYZE, REINDEX,
REFRESH MATERIALIZE VIEW, CLUSTER, and LOCK TABLE on the relation.
Roles with privileges of pg_maintain may run those same commands on
all relations.

This was previously committed for v16, but it was reverted in
commit 151c22deee due to concerns about search_path tricks that
could be used to escalate privileges to the table owner.  Commits
2af07e2f74, 59825d1639, and c7ea3f4229 resolved these concerns by
restricting search_path when running maintenance commands.

Bumps catversion.

Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20240305161235.GA3478007%40nathanxps13
2024-03-13 14:49:26 -05:00
Robert Haas 2041bc4276 Add the system identifier to backup manifests.
Before this patch, if you took a full backup on server A and then
tried to use the backup manifest to take an incremental backup on
server B, it wouldn't know that the manifest was from a different
server and so the incremental backup operation could potentially
complete without error. When you later tried to run pg_combinebackup,
you'd find out that your incremental backup was and always had been
invalid. That's poor timing, because nobody likes finding out about
backup problems only at restore time.

With this patch, you'll get an error when trying to take the (invalid)
incremental backup, which seems a lot nicer.

Amul Sul, revised by me. Review by Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoYLZzbSAMM3cAjV4Y+iCRZn-bR9H2+Mdz7NdaJFU1Zb5w@mail.gmail.com
2024-03-13 15:12:33 -04:00
Peter Eisentraut 97d85be365 Make the order of the header file includes consistent
Similar to commit 7e735035f2.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
2024-03-13 15:07:00 +01:00
Michael Paquier 77cf6a78de Add some asserts based on LWLockHeldByMe() for replication slot statistics
Two assertions checking that ReplicationSlotAllocationLock is acquired
are added to pgstat_create_replslot() and pgstat_drop_replslot(),
corresponding to the routines in charge of the creation and the drop of
replication slot statistics.  The code previously relied on this
assumption and documented it in comments, but did not enforce this
policy at runtime.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Ze_p-hmD_yFeVYXg@paquier.xyz
2024-03-13 07:45:11 +09:00
Tom Lane 6ee3261e9b Fix confusion about the return rowtype of SQL-language procedures.
There is a very ancient hack in check_sql_fn_retval that allows a
single SELECT targetlist entry of composite type to be taken as
supplying all the output columns of a function returning composite.
(This is grotty and fundamentally ambiguous, but it's really hard
to do nested composite-returning functions without it.)

As far as I know, that doesn't cause any problems in ordinary
functions.  It's disastrous for procedures however.  All procedures
that have any output parameters are labeled with prorettype RECORD,
and the CALL code expects it will get back a record with one column
per output parameter, regardless of whether any of those parameters
is composite.  Doing something else leads to an assertion failure
or core dump.

This is simple enough to fix: we just need to not apply that rule
when considering procedures.  However, that requires adding another
argument to check_sql_fn_retval, which at least in principle might be
getting called by external callers.  Therefore, in the back branches
convert check_sql_fn_retval into an ABI-preserving wrapper around a
new function check_sql_fn_retval_ext.

Per report from Yahor Yuzefovich.  This has been broken since we
implemented procedures, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CABz5gWHSjj2df6uG0NRiDhZ_Uz=Y8t0FJP-_SVSsRsnrQT76Gg@mail.gmail.com
2024-03-12 18:16:25 -04:00
Heikki Linnakangas cb9663e20d Fix copying SockAddr struct
Valgrind alerted about accessing uninitialized bytes after commit
4945e4ed4a:

==700242== VALGRINDERROR-BEGIN
==700242== Conditional jump or move depends on uninitialised value(s)
==700242==    at 0x6D8A2A: getnameinfo_unix (ip.c:253)
==700242==    by 0x6D8BD1: pg_getnameinfo_all (ip.c:122)
==700242==    by 0x4B3EB6: BackendInitialize (postmaster.c:4266)
==700242==    by 0x4B684E: BackendStartup (postmaster.c:4114)
==700242==    by 0x4B6986: ServerLoop (postmaster.c:1780)
==700242==    by 0x4B80CA: PostmasterMain (postmaster.c:1478)
==700242==    by 0x3F7424: main (main.c:197)
==700242==  Uninitialised value was created by a stack allocation
==700242==    at 0x4B6934: ServerLoop (postmaster.c:1737)
==700242==
==700242== VALGRINDERROR-END

That was because the SockAddr struct was not copied correctly.

Per buildfarm animal "skink".
2024-03-12 15:31:02 +02:00
Heikki Linnakangas 4945e4ed4a Move initialization of the Port struct to the child process
In postmaster, use a more lightweight ClientSocket struct that
encapsulates just the socket itself and the remote endpoint's address
that you get from accept() call. ClientSocket is passed to the child
process, which initializes the bigger Port struct. This makes it more
clear what information postmaster initializes, and what is left to the
child process.

Rename the StreamServerPort and StreamConnection functions to make it
more clear what they do. Remove StreamClose, replacing it with plain
closesocket() calls.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-12 13:42:38 +02:00
Heikki Linnakangas d162c3a73b Pass CAC as an argument to the backend process
We used to smuggle it to the child process in the Port struct, but it
seems better to pass it down as a separate argument. This paves the
way for the next commit, which moves the initialization of the Port
struct to the backend process, after forking.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-12 13:42:36 +02:00
Heikki Linnakangas 73f7fb2a4c Set socket options in child process after forking
Try to minimize the work done in the postmaster process for each
accepted connection, so that postmaster can quickly proceed with its
duties. These function calls are very fast so this doesn't make any
measurable performance difference in practice, but it's nice to have
all the socket options initialization code in one place for sake of
readability too. This also paves the way for an upcoming commit that
will move the initialization of the Port struct to the child process.

Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-12 13:42:28 +02:00
Heikki Linnakangas f8c5317d00 Disconnect if socket cannot be put into non-blocking mode
Commit 387da18874 moved the code to put socket into non-blocking mode
from socket_set_nonblocking() into the one-time initialization
function, pq_init(). In socket_set_nonblocking(), there indeed was a
risk of recursion on failure like the comment said, but in pq_init(),
ERROR or FATAL is fine. There's even another elog(FATAL) just after
this, if setting FD_CLOEXEC fails.

Note that COMMERROR merely logged the error, it did not close the
connection, so if putting the socket to non-blocking mode failed we
would use the connection anyway. You might not immediately notice,
because most socket operations in a regular backend wait for the
socket to become readable/writable anyway. But e.g. replication will
be quite broken.

Backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/d40a5cd0-2722-40c5-8755-12e9e811fa3c@iki.fi
2024-03-12 10:18:32 +02:00
Michael Paquier d6e171fed6 Keep replication slot statistics on invalidation
The code path in charge of invalidating a replication slot includes a
call to pgstat_drop_replslot(), which would result in removing the
statistics of the slot once invalidated.  However, there is no need to
remove the statistics of an invalidated slot as one could still be
interested in looking at them to understand the activity of the slot
until its actual removal.

The initial design of the feature committed in be87200efd used the
approach to drop the slots, which is likely why the statistics were
still removed during the invalidation.

Another problem with this operation is that it was done without holding
ReplicationSlotAllocationLock, leaving it unprotected on concurrent
activity.  This part is arguably a bug, but that's a limited problem in
practice so no backpatch is done.

In passing, this commit adds a test to check this behavior.  The only
remaining code path where slot statistics are dropped now related to the
slot getting dropped.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/ZermH08Eq6YydHpO@ip-10-97-1-34.eu-west-3.compute.internal
2024-03-12 14:22:31 +09:00
Amit Kapila 397cd0b3c7 Remove redundant fetch of the recent flush pointer in WalSndWaitForWal.
In WalSndWaitForWal(), we fetch a recent flush pointer both outside the
loop and inside the loop. But we start using RecentFlushPtr only after we
fetch it inside the loop. So we can remove one outside the loop.

Author: Shveta Malik
Reviewed-by: Bertrand Drouvot, Matthias van de Meent, Amit Kapila
Discussion: https://postgr.es/m/CAJpy0uBSCQz1yMD-WiEthzEe23dti2-Kr_pitVb7vAPFbFKm=A@mail.gmail.com
2024-03-12 10:25:27 +05:30
Michael Paquier 2c8118ee5d Use printf's %m format instead of strerror(errno) in more places
Most callers of strerror() are removed from the backend code.  The
remaining callers require special handling with a saved errno from a
previous system call.  The frontend code still needs strerror() where
error states need to be handled outside of fprintf.

Note that pg_regress is not changed to use %m as the TAP output may
clobber errno, since those functions call fprintf() and friends before
evaluating the format string.

Support for %m in src/port/snprintf.c has been added in d6c55de1f9,
hence all the stable branches currently supported include it.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org
2024-03-12 10:02:54 +09:00
Peter Geoghegan 3045324214 Update obsolete index scan TID comments.
Oversight in commit c2fe139c20.
2024-03-11 18:07:10 -04:00
Heikki Linnakangas 3d8652cd32 Remove unneeded vacuum_delay_point from heap_vac_scan_get_next_block
heap_vac_scan_get_next_block() does relatively little work, so there
is no need to call vacuum_delay_point(). A future commit will call
heap_vac_scan_get_next_block() from a callback, and we would like to
avoid calling vacuum_delay_point() in that callback.

Author: Melanie Plageman
Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
2024-03-11 20:45:33 +02:00
Heikki Linnakangas 4e76f984a7 Confine vacuum skip logic to lazy_scan_skip()
Rename lazy_scan_skip() to heap_vac_scan_next_block() and move more
code into the function, so that the caller doesn't need to know about
ranges or skipping anymore. heap_vac_scan_next_block() returns the
next block to process, and the logic for determining that block is all
within the function. This makes the skipping logic easier to
understand, as it's all in the same function, and makes the calling
code easier to understand as it's less cluttered. The state variables
needed to manage the skipping logic are moved to LVRelState.

heap_vac_scan_next_block() now manages its own VM buffer separately
from the caller's vmbuffer variable. The caller's vmbuffer holds the
VM page for the current block its processing, while
heap_vac_scan_next_block() keeps a pin on the VM page for the next
unskippable block. Most of the time they are the same, so we hold two
pins on the same buffer, but it's more convenient to manage them
separately.

For readability inside heap_vac_scan_next_block(), move the logic of
finding the next unskippable block to separate function, and add some
comments.

This refactoring will also help future patches to switch to using a
streaming read interface, and eventually AIO
(https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com)

Author: Melanie Plageman, Heikki Linnakangas
Reviewed-by: Andres Freund (older version)
Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
2024-03-11 20:43:58 +02:00
Heikki Linnakangas 674e49c73c Set all_visible_according_to_vm correctly with DISABLE_PAGE_SKIPPING
It's important for 'all_visible_according_to_vm' to correctly reflect
whether the VM bit is set or not, even when we are not trusting the VM
to skip pages, because contrary to what the comment said,
lazy_scan_prune() relies on it.

If it's incorrectly set to 'false', when the VM bit is in fact set,
lazy_scan_prune() will try to set the VM bit again and dirty the page
unnecessarily. As a result, if you used DISABLE_PAGE_SKIPPING, all
heap pages were dirtied, even if there were no changes. We would also
fail to clear any VM bits that were set incorrectly.

This was broken in commit 980ae17310, so backpatch to v16.

Backpatch-through: 16
Reviewed-by: Melanie Plageman, Peter Geoghegan
Discussion: https://www.postgresql.org/message-id/3df2b582-dc1c-46b6-99b6-38eddd1b2784@iki.fi
2024-03-11 09:28:09 +02:00
Heikki Linnakangas af0e7deb4a Don't destroy SMgrRelations at relcache invalidation
With commit 21d9c3ee4e, SMgrRelations remain valid until end of
transaction (or longer if they're "pinned"). Relcache invalidation can
happen in the middle of a transaction, so we must not destroy them at
relcache invalidation anymore.

This was revealed by failures in the 'constraints' test in buildfarm
animals using -DCLOBBER_CACHE_ALWAYS. That started failing with commit
8af2565248, which was the first commit that started to rely on an
SMgrRelation living until end of transaction.

Diagnosed-by: Tomas Vondra, Thomas Munro
Reviewed-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGK%2B5DOmLaBp3Z7C4S-Yv6yoROvr1UncjH2S1ZbPT8D%2BZg%40mail.gmail.com
2024-03-11 09:08:02 +02:00
David Rowley e629846472 Fix incorrect accessing of pfree'd memory in Memoize
For pass-by-reference types, the code added in 0b053e78b, which aimed to
resolve a memory leak, was overly aggressive in resetting the per-tuple
memory context which could result in pfree'd memory being accessed
resulting in failing to find previously cached results in the hash
table.

What was happening was prepare_probe_slot() was switching to the
per-tuple memory context and calling ExecEvalExpr().  ExecEvalExpr() may
have required a memory allocation.  Both MemoizeHash_hash() and
MemoizeHash_equal() were aggressively resetting the per-tuple context
and after determining the hash value, the context would have gotten reset
before MemoizeHash_equal() was called.  This could have resulted in
MemoizeHash_equal() looking at pfree'd memory.

This is less likely to have caused issues on a production build as some
other allocation would have had to have reused the pfree'd memory to
overwrite it.  Otherwise, the original contents would have been intact.
However, this clearly caused issues on MEMORY_CONTEXT_CHECKING builds.

Author: Tender Wang, Andrei Lepikhov
Reported-by: Tender Wang (using SQLancer)
Reviewed-by: Andrei Lepikhov, Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAHewXNnT6N6UJkya0z-jLFzVxcwGfeRQSfhiwA+NyLg-x8iGew@mail.gmail.com
Backpatch-through: 14, where Memoize was added
2024-03-11 18:19:56 +13:00
Michael Paquier b36fbd9f8d Improve consistency of replication slot statistics
The replication slot stats stored in shared memory rely on an internal
index number.  Both pgstat_reset_replslot() and pgstat_fetch_replslot()
lacked some LWLock protections with ReplicationSlotControlLock while
operating on these index numbers.  This issue could cause these two
functions to potentially operate on incorrect slots when taken in
isolation in the event of slots dropped and/or re-created concurrently.

Note that pg_stat_get_replication_slot() is called once per slot when
querying pg_stat_replication_slots, meaning that the stats are retrieved
across multiple ReplicationSlotControlLock acquisitions.  So, while this
commit improves more consistency, it may still be possible that
statistics are not completely consistent for a single scan of
pg_stat_replication_slots under concurrent replication slot drop or
creation activity.

The issue should unlikely be a problem in practice, causing the report
of inconsistent stats or or the stats reset of an incorrect slot, so no
backpatch is done.

Author: Bertrand Drouvot
Reviewed-by: Heikki Linnakangas, Shveta Malik, Michael Paquier
Discussion: https://postgr.es/m/ZeGq1HDWFfLkjh4o@ip-10-97-1-34.eu-west-3.compute.internal
2024-03-11 10:25:01 +09:00
Michael Paquier f500ba07fa Add some checkpoint and redo LSNs to a couple of recovery errors
Two FATALs and one PANIC gain details about the LSNs they fail at:
- When restoring from a backup_label, the FATAL log generated when not
finding the checkpoint record now reports its LSN.
- When restoring from a backup_label, the FATAL log generated when not
finding the redo record referenced by a checkpoint record now shows both
the redo and checkpoint record LSNs.
- When not restoring from a backup_label, the PANIC error generated when
not finding the checkpoint record now reports its LSN.

This information is useful when debugging corruption issues, and these
LSNs may not show up in the logs depending on the level of logging
configured in the backend.

Author: David Steele
Discussion: https://postgr.es/m/0e90da89-77ca-4ccf-872c-9626d755e288@pgmasters.net
2024-03-11 09:08:05 +09:00
Michael Paquier a04ddd077e Improve support for ExplainOneQuery() hook
There is a hook called ExplainOneQuery_hook that gives modules the
possibility to plug into this code path, but, like utility.c for utility
statement execution, there is no corresponding "standard" routine in
the case of EXPLAIN executed for one Query.

This commit adds a new standard_ExplainOneQuery() in explain.c, which is
able to run explain on a non-utility Query without calling its hook.

Per the feedback received from a couple of hackers, this change gives
the possibility to cut a few hundred lines of code in some of the
popular out-of-core modules as these maintained a copy of
ExplainOneQuery(), adding custom extra information at the beginning or
the end of the EXPLAIN output.

Author: Mats Kindahl
Reviewed-by: Aleksander Alekseev, Jelte Fennema-Nio, Andrei Lepikhov
Discussion: https://postgr.es/m/CA+14427V_B4EAoC_o-iYYucRdMSOTfpuH9k-QbexffY1HYJBiA@mail.gmail.com
2024-03-11 08:40:40 +09:00
Jeff Davis f696c0cd5f Catalog changes preparing for builtin collation provider.
Rename pg_collation.colliculocale to colllocale, and
pg_database.daticulocale to datlocale. These names reflects that the
fields will be useful for the upcoming builtin provider as well, not
just for ICU.

This is purely a rename; no changes to the meaning of the fields.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Peter Eisentraut
2024-03-09 14:48:18 -08:00
Michael Paquier f160bf06f7 Document units of "timeout" in ConditionVariableTimedSleep()
The timeout is passed down to WaitLatch() as milliseconds.

Author: Shveta Malik
Discussion: https://postgr.es/m/CAJpy0uC=xiBQD1WapgYYvOiytap6ULJaakLd867zZXqu9tYc8w@mail.gmail.com
2024-03-09 15:44:41 +09:00
Alvaro Herrera 270af6f0df
Admit deferrable PKs into rd_pkindex, but flag them as such
... and in particular don't return them as replica identity.

The motivation for this change is letting the primary keys be seen by
code that derives NOT NULL constraints from them, when creating
inheritance children; before this change, if you had a deferrable PK,
pg_dump would not recreate the attnotnull marking properly, because the
column would not be considered as having anything to back said marking
after dropping the throwaway NOT NULL constraint.

The reason we don't want these PKs as replica identities is that
replication can corrupt data, if the uniqueness constraint is
transiently broken.

Reported-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b94QonkgsbDXofakHDnORQNgafd1y3Oa5QXfpQNJyXyQ7A@mail.gmail.com
2024-03-08 16:32:29 +01:00
Alexander Korotkov 4c1973fcae Avoid recursion in MemoryContext functions
You might run out of stack space with recursion, which is not nice in
functions that might be used e.g. at cleanup after transaction
abort. MemoryContext contains pointer to parent and siblings, so we
can traverse a tree of contexts iteratively, without using
stack. Refactor the functions to do that.

MemoryContextStats() still recurses, but it now has a limit to how
deep it recurses. Once the limit is reached, it prints just a summary
of the rest of the hierarchy, similar to how it summarizes contexts
with lots of children. That seems good anyway, because a context dump
with hundreds of nested contexts isn't very readable.

Report by Egor Chindyaskin and Alexander Lakhin.

Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru
Author: Heikki Linnakangas
Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov, Tom Lane
2024-03-08 13:18:30 +02:00
Alexander Korotkov 6f38c43eb1 Avoid stack overflow in ShowTransactionStateRec()
The function recurses, but didn't perform stack-depth checks. It's
just a debugging aid, so instead of the usual check_stack_depth()
call, stop the printing if we'd risk stack overflow.

Here's an example of how to test this:

    (n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "SET log_min_messages = 'DEBUG5'; SAVEPOINT sp;") | psql >/dev/null

In the passing, swap building the list of child XIDs and recursing to
parent. That saves memory while recursing, reducing the risk of out of
memory errors with lots of subtransactions. The saving is not very
significant in practice, but this order seems more logical anyway.

Report by Egor Chindyaskin and Alexander Lakhin.

Discussion: https://www.postgresql.org/message-id/1672760457.940462079%40f306.i.mail.ru
Author: Heikki Linnakangas
Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov
2024-03-08 13:18:30 +02:00
Alexander Korotkov fefd9a3fed Turn tail recursion into iteration in CommitTransactionCommand()
Usually the compiler will optimize away the tail recursion anyway, but
if it doesn't, you can drive the function into stack overflow. For
example:

    (n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "ERROR; COMMIT;") | psql >/dev/null

In order to get better readability and less changes to the existing code the
recursion-replacing loop is implemented as a wrapper function.

Report by Egor Chindyaskin and Alexander Lakhin.
Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru
Author: Alexander Korotkov, Heikki Linnakangas
2024-03-08 13:18:30 +02:00
Amit Kapila bf279ddd1c Introduce a new GUC 'standby_slot_names'.
This patch provides a way to ensure that physical standbys that are
potential failover candidates have received and flushed changes before
the primary server making them visible to subscribers. Doing so guarantees
that the promoted standby server is not lagging behind the subscribers
when a failover is necessary.

The logical walsender now guarantees that all local changes are sent and
flushed to the standby servers corresponding to the replication slots
specified in 'standby_slot_names' before sending those changes to the
subscriber.

Additionally, the SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to ensure that they process changes for failover slots only after physical
slots specified in 'standby_slot_names' have confirmed WAL receipt for those.

Author: Hou Zhijie and Shveta Malik
Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-03-08 08:10:45 +05:30
Tom Lane 453c468737 Cope with a deficiency in OpenSSL 3.x's error reporting.
In OpenSSL 3.0.0 and later, ERR_reason_error_string randomly refuses
to provide a string for error codes representing system errno values
(e.g., "No such file or directory").  There is a poorly-documented way
to extract the errno from the SSL error code in this case, so do that
and apply strerror, rather than falling back to reporting the error
code's numeric value as we were previously doing.

Problem reported by David Zhang, although this is not his proposed
patch; it's instead based on a suggestion from Heikki Linnakangas.
Back-patch to all supported branches, since any of them are likely
to be used with recent OpenSSL.

Discussion: https://postgr.es/m/b6fb018b-f05c-4afd-abd3-318c649faf18@highgo.ca
2024-03-07 19:38:17 -05:00
Michael Paquier d61a6cad64 Add support for DEFAULT in ALTER TABLE .. SET ACCESS METHOD
This option can be used to switch a relation to use the access method
set by default_table_access_method when running the command.

This has come up when discussing the possibility to support setting
pg_class.relam for partitioned tables (left out here as future work),
while being useful on its own for relations with physical storage as
these must have an access method set.

Per suggestion from Justin Pryzby.

Author: Michael Paquier
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/ZeCZ89xAVFeOmrQC@pryzbyj2023
2024-03-08 09:31:52 +09:00
Peter Eisentraut 6d470211e5 Fix description and grouping of RangeTblEntry.inh
The inh field of RangeTblEntry was doubly confusingly documented.
Some parts of the code insisted that it was only valid for
RTE_RELATION entries, other parts said the field was valid for all
entries.  Neither was quite correct.  More correctly, the field is
valid for RTE_RELATION entries but is also used in the planner for
RTE_SUBQUERY entries.  So it makes more sense to group it with other
fields that are primarily for RTE_RELATION but borrowed by
RTE_SUBQUERY.  (The exact position was chosen so that it is next to
relkind for better struct packing, and next to relid, since relid and
inh are sort of the input fields and the others are filled in later.)
Also add documentation for the planner's use at the struct definition.

Discussion: https://www.postgresql.org/message-id/6c1fbccc-85c8-40d3-b08b-4f47f2093711@eisentraut.org
2024-03-07 12:13:09 +01:00
Dean Rasheed 29ef1dd19b Fix handling of self-modified tuples in MERGE.
When an UPDATE or DELETE action in MERGE returns TM_SelfModified,
there are 2 possible causes:

1). The target tuple was already updated or deleted by the current
    command. This can happen if the target row joins to more than one
    source row, and the SQL standard explicitly says that this must be
    an error.

2). The target tuple was already updated or deleted by a later command
    in the current transaction. This can happen if the tuple is
    modified by a BEFORE trigger or a volatile function used in the
    query, and should be an error for the same reason that it is in a
    plain UPDATE or DELETE command.

In MERGE's primary error handling block, it failed to check for (2),
causing it to return a misleading error message in such cases.

In the secondary error handling block, following a concurrent update
from another session, it failed to check for (1), causing it to
silently ignore target rows joined to more than one source row,
instead of reporting an error.

Fix this, and add tests for both of these cases.

Per report from Wenjiang Zhang. Back-patch to v15, where MERGE was
introduced.

Discussion: https://postgr.es/m/tencent_41DE0FF443FE14B94A5898D373792109E408%40qq.com
2024-03-07 09:57:02 +00:00
John Naylor ee1b30f128 Add template for adaptive radix tree
This implements a radix tree data structure based on the design in
"The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases"
by Viktor Leis, Alfons Kemper, and ThomasNeumann, 2013. The main
technique that makes it adaptive is using several different node types,
each with a different capacity of elements, and a different algorithm
for accessing them. The nodes start small and grow/shrink as needed.

The main advantage over hash tables is efficient sorted iteration and
better memory locality when successive keys are lexicographically
close together. The implementation currently assumes 64-bit integer
keys, and traversing the tree is in general slower than a linear
probing hash table, so this is not a general-purpose associative array.

The paper describes two other techniques not implemented here,
namely "path compression" and "lazy expansion". These can further
reduce memory usage and speed up traversal, but the former would add
significant complexity and the latter requires storing the full key
with the value. We do trivially compress the path when leading bytes
of the key are zeros, however.

For value storage, we use "combined pointer/value slots", as
recommended in the paper. Values of size equal or smaller than the the
platform's pointer type are stored in the array of child pointers in
the last level node, while larger values are each stored in a separate
allocation. This is for now fixed at compile time, but it would be
fairly trivial to allow determining at runtime how variable-length
values are stored.

One innovation in our implementation compared to the ART paper is
decoupling the notion of node "size class" from "kind". The size
classes within a given node kind have the same underlying type, but
a variable capacity for children, so we can introduce additional node
sizes with little additional code.

To enable different use cases to specialize for different value types
and for shared/local memory, we use macro-templatized code generation
in the same manner as simplehash.h and sort_template.h.

Future commits will use this infrastructure for storing TIDs.

Patch by Masahiko Sawada and John Naylor, but a substantial amount of
credit is due to Andres Freund, whose proof-of-concept was a valuable
source of coding idioms and awareness of performance pitfalls, and
who reviewed earlier versions.

Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-03-07 12:40:11 +07:00
Michael Paquier 65db0cfb4c Revert "Add recovery TAP test for race condition with slot invalidations"
This reverts commit 08a52ab151, due to some sporadic instability in
the test.  Getting the test right should require some redesign with a
second injection point, but let's revert it for now to avoid these
issues in the CI as a lot of patches are under discussion in this last
commit fest.

Per buildfarm members hachi and gokiburi.

Discussion: https://postgr.es/m/ZekQQHCrIqLVpGz5@paquier.xyz
2024-03-07 09:57:52 +09:00
Michael Paquier 099ca50bd4 Revert "Fix parallel-safety check of expressions and predicate for index builds"
This reverts commit eae7be600b, following a discussion with Tom Lane,
due to concerns that this impacts the decisions made by the planner for
the number of workers spawned based on the inlining and const-folding of
index expressions and predicate for cases that would have worked until
this commit.

Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us
Backpatch-through: 12
2024-03-07 08:30:35 +09:00
Tom Lane 2ed8f9a01e Fix type-checking of RECORD-returning functions in FROM.
In the corner case where a function returning RECORD has been
simplified to a RECORD constant or an inlined ROW() expression,
ExecInitFunctionScan failed to cross-check the function's result
rowtype against the coldeflist provided by the calling query.
That happened because get_expr_result_type is able to extract a
tupdesc from such expressions, which led ExecInitFunctionScan to
ignore the coldeflist.  (Instead, it used the extracted tupdesc
to check the function's output, which of course always succeeds.)

I have not been able to demonstrate any really serious consequences
from this, because if some column of the result is of the wrong
type and is directly referenced by a Var of the calling query,
CheckVarSlotCompatibility will catch it.  However, we definitely do
fail to report the case where the function returns more columns than
the coldeflist expects, and in the converse case where it returns
fewer columns, we get an assert failure (but, seemingly, no worse
results in non-assert builds).

To fix, always build the expected tupdesc from the coldeflist if there
is one, and consult get_expr_result_type only when there isn't one.

Also remove the failing Assert, even though it is no longer reached
after this fix.  It doesn't seem to be adding anything useful, since
later checking will deal with cases with the wrong number of columns.

The only other place I could find that is doing something similar
is inline_set_returning_function.  There's no live bug there because
we cannot be looking at a Const or RowExpr, but for consistency
change that code to agree with ExecInitFunctionScan.

Per report from PetSerAl.  After some debate I've concluded that
this should be back-patched.  There is a small risk that somebody
has been relying on such a case not throwing an error, but I judge
this outweighed by the risk that I've missed some way in which the
failure to cross-check has worse consequences than sketched above.

Discussion: https://postgr.es/m/CAKygsHSerA1eXsJHR9wft3Gn3wfHQ5RfP8XHBzF70_qcrrRvEg@mail.gmail.com
2024-03-06 14:41:13 -05:00
Michael Paquier eae7be600b Fix parallel-safety check of expressions and predicate for index builds
As coded, the planner logic that calculates the number of parallel
workers to use for a parallel index build uses expressions and
predicates from the relcache, which are flattened for the planner by
eval_const_expressions().

As reported in the bug, an immutable parallel-unsafe function flattened
in the relcache would become a Const, which would be considered as
parallel-safe, even if the predicate or the expressions including the
function are not safe in parallel workers.  Depending on the expressions
or predicate used, this could cause the parallel build to fail.

Tests are included that check parallel index builds with parallel-unsafe
predicate and expressions.  Two routines are added to lsyscache.h to be
able to retrieve expressions and predicate of an index from its pg_index
data.

Reported-by: Alexander Lakhin
Author: Tender Wang
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com
Backpatch-through: 12
2024-03-06 17:23:56 +09:00
John Naylor 3e76a806cb Move some bitmap logic out of bitmapset.c
Move the logic for selecting appropriate pg_bitutils.h
functions based on word size to bitmapset.h for wider
visibility.

Reviewed (in a previous version) by Tom Lane
Discussion: https://postgr.es/m/CAFBsxsFW2JjTo58jtDB%2B3sZhxMx3t-3evew8%3DAcr%2BGGhC%2BkFaA%40mail.gmail.com
2024-03-06 14:30:16 +07:00
Michael Paquier 08a52ab151 Add recovery TAP test for race condition with slot invalidations
This commit adds a recovery test to provide coverage for the bug fixed
in 818fefd8fd, using an injection point to wait just after the process
of an active slot is killed.  The trick is to give enough time for
effective_xmin and effective_catalog_xmin to advance so as the slot
invalidation robustness can be checked since the active process is
killed without holding its slot's mutex for a short time.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/ZdyZya4YrNapWKqz@ip-10-97-1-34.eu-west-3.compute.internal
2024-03-06 14:39:40 +09:00
David Rowley 2bce0ad67f Remove surplus trailing semicolon
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-qjotfa7G=5PEOw4LDDDX58MmTwDdpdoU3Quse_BKv1Q@mail.gmail.com
2024-03-06 10:57:31 +13:00
Jeff Davis 0984a3b851 Run pgindent again on the same file.
Apparently, pgindent got confused by the double space. The first time
I ran it, it moved the function name to the next line. The second time
I ran it, it moved the function name back, but without the double
space.

Now the results appear stable.
2024-03-05 11:16:23 -08:00
Jeff Davis b406af1806 Run pgindent for commit ef4cfdce0e. 2024-03-05 10:58:24 -08:00
Heikki Linnakangas ef4cfdce0e Fix references to renamed function in comments
I renamed the function in commit 024c521117, but missed these
comments.

Reported-by: Richard Guo
Discussion: https://www.postgresql.org/message-id/CAMbWs4-jR6qc7JRMKwz-zXQy_AYLUZ3PHjGep4B91of321cqWw@mail.gmail.com
2024-03-05 18:23:58 +02:00
Peter Eisentraut e03349144b Improve field order in RangeTblEntry
When perminfoindex was added, it was just added at the end of the
block.  It would make sense to keep it closer to more related fields.
In passing, also add an inline comment, like the other fields have.
(Other field reorderings and documentation improvements in
RangeTblEntry are being discussed, but it's better not to mix them
together.)

Discussion: https://www.postgresql.org/message-id/flat/6c1fbccc-85c8-40d3-b08b-4f47f2093711%40eisentraut.org
2024-03-05 13:34:43 +01:00
Alvaro Herrera 0d3a71d0c8
Fix misspelled assertions
Remove an extra & operator, per Tom Lane.  My bugs, introduced with
commit 53c2a97a92.

Discussion: https://postgr.es/m/3885480.1709590472@sss.pgh.pa.us
2024-03-05 12:13:12 +01:00
Alvaro Herrera 1a2654b32b
Rework redundant code in subtrans.c
When this code was written the duplicity didn't matter, but with all the
SLRU-bank stuff we just added, it has become excessive.  Turn it into a
simpler loop with no code duplication.  Also add a test so that this
code becomes covered.

Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql
2024-03-05 12:09:18 +01:00
Peter Eisentraut 030e10ff1a Rename pg_constraint.conwithoutoverlaps to conperiod
pg_constraint.conwithoutoverlaps was recently added to support primary
keys and unique constraints with the WITHOUT OVERLAPS clause.  An
upcoming patch provides the foreign-key side of this functionality,
but the syntax there is different and uses the keyword PERIOD.  It
would make sense to use the same pg_constraint field for both of
these, but then we should pick a more general name that conveys "this
constraint has a temporal/period-related feature".  conperiod works
for that and is nicely compact.  Changing this now avoids possibly
having to introduce versioning into clients.  Note there are still
some "without overlaps" variables left, which deal specifically with
the parsing of the primary key/unique constraint feature.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-03-05 11:24:17 +01:00
Heikki Linnakangas 55cdba2647 Fix a leftover reference to backend_id in comment
Commit 024c521117 replaced backend_id with proc_number.

Reported-by: Alexander Lakhin
2024-03-05 09:15:02 +02:00
Jeff Davis 59825d1639 Fix buildfarm failures from 2af07e2f74.
Use GUC_ACTION_SAVE rather than GUC_ACTION_SET, necessary for working
with parallel query.

Now that the call requires more arguments, wrap the call in a new
function to avoid code duplication and offer a place for a comment.

Discussion: https://postgr.es/m/E1rhJpO-0027Wf-9L@gemulon.postgresql.org
2024-03-04 19:42:16 -08:00
David Rowley a37a3e2b36 Fix incorrectly reported stats kind in "can't happen" ERROR
The error message(s) were reporting the stats kind of 'f', which is not
correct as that's for the "dependencies" statistics kind.

Reported-by: Horst Reiterer
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/18375-ba99383eb9062d6a@postgresql.org
Backpatch-through: 12, where MCV extended stats were added.
2024-03-05 16:17:02 +13:00
Jeff Davis 2af07e2f74 Fix search_path to a safe value during maintenance operations.
While executing maintenance operations (ANALYZE, CLUSTER, REFRESH
MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to
'pg_catalog, pg_temp' to prevent inconsistent behavior.

Functions that are used for functional indexes, in index expressions,
or in materialized views and depend on a different search path must be
declared with CREATE FUNCTION ... SET search_path='...'.

This change was previously committed as 05e1737351, then reverted in
commit 2fcc7ee7af because it was too late in the cycle.

Preparation for the MAINTAIN privilege, which was previously reverted
due to search_path manipulation hazards.

Discussion: https://postgr.es/m/d4ccaf3658cb3c281ec88c851a09733cd9482f22.camel@j-davis.com
Discussion: https://postgr.es/m/E1q7j7Y-000z1H-Hr%40gemulon.postgresql.org
Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com
Reviewed-by: Greg Stark, Nathan Bossart, Noah Misch
2024-03-04 17:31:38 -08:00
Nathan Bossart 2c29e7fc95 Add macro for customizing an archiving WARNING message.
Presently, if an archive module's check_configured_cb callback
returns false, a generic WARNING message is emitted, which
unfortunately provides no actionable details about the reason why
the module is not configured.  This commit introduces a macro that
archive module authors can use to add a DETAIL line to this WARNING
message.

Co-authored-by: Tung Nguyen
Reviewed-by: Daniel Gustafsson, Álvaro Herrera
Discussion: https://postgr.es/m/4109578306242a7cd5661171647e11b2%40oss.nttdata.com
2024-03-04 15:41:42 -06:00
Tom Lane e5bc9454e5 Explicitly list dependent types as extension members in pg_depend.
Auto-generated array types, multirange types, and relation rowtypes
are treated as dependent objects: they can't be dropped separately
from the base object, nor can they have their own ownership or
permissions.  We previously felt that, for objects that are in an
extension, only the base object needs to be listed as an extension
member in pg_depend.  While that's sufficient to prevent inappropriate
drops, it results in undesirable answers if someone asks whether a
dependent type belongs to the extension.  It looks like the dependent
type is just some random separately-created object that happens to
depend on the base object.  Notably, this results in postgres_fdw
concluding that expressions involving an array type are not shippable
to the remote server, even when the defining extension has been
whitelisted.

To fix, cause GenerateTypeDependencies to make extension dependencies
for dependent types as well as their base objects, and adjust
ExecAlterExtensionContentsStmt so that object addition and removal
operations recurse to dependent types.  The latter change means that
pg_upgrade of a type-defining extension will end with the dependent
type(s) now also listed as extension members, even if they were
not that way in the source database.  Normally we want pg_upgrade
to precisely reproduce the source extension's state, but it seems
desirable to make an exception here.

This is arguably a bug fix, but we can't back-patch it since it
causes changes in the expected contents of pg_depend.  (Because
it does, I've bumped catversion, even though there's no change
in the immediate post-initdb catalog contents.)

Tom Lane and David Geier

Discussion: https://postgr.es/m/4a847c55-489f-4e8d-a664-fc6b1cbe306f@gmail.com
2024-03-04 14:49:36 -05:00
Robert Haas dd7ea37c43 Fix pgindent damage.
Apparently, I neglected to pgindent the prior commit.

Per buildfarm.
2024-03-04 14:37:35 -05:00
Robert Haas d75c4027b6 Fix incremental backup interaction with XLOG_DBASE_CREATE_FILE_COPY.
After XLOG_DBASE_CREATE_FILE_COPY, a correct incremental backup needs
to copy in full everything with the database and tablespace OID
mentioned in that record; but that record doesn't specifically mention
the blocks, or even the relfilenumbers, of the affected relations.
As a result, we were failing to copy data that we should have copied.

To fix, enter the DB OID and tablespace OID into the block reference
table with relfilenumber 0 and limit block 0; and treat that as a
limit block of 0 for every relfilenumber whose DB OID and tablespace
OID match.

Also, add a test case.

Patch by me, reviewed by Noah Misch.

Discussion: http://postgr.es/m/CA+Tgmob0xa=ByvGLMdAgkUZyVQE=r4nyYZ_VEa40FCfEDFnTKA@mail.gmail.com
2024-03-04 13:33:28 -05:00
Alvaro Herrera a0b808baef
Rework locking code in GetMultiXactIdMembers
After commit 53c2a97a92, the code flow around the "retry" goto label
in GetMultiXactIdMembers was confused about what was possible: we never
return there with a held lock, so there's no point in testing for one.
This realization lets us simplify the code a bit.  While at it, make the
scope of a couple of local variables in the same function a bit tighter.

Per Coverity.
2024-03-04 17:48:01 +01:00
Alvaro Herrera f9baaf96d3
Simplify coding in slru.c
New code in 53c2a97a92 uses direct array access to
shared->bank_locks[bankno].lock which can be made a little bit more
legible by using the SimpleLruGetBankLock helper function.
Nothing terribly serious, but let's add some clarity.

Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql
2024-03-04 17:37:47 +01:00
Peter Eisentraut 43a8875f49 Put back required #include
Fix for dbbca2cf29: "storage/shmem.h" is required with
-Dspinlocks=false.
2024-03-04 13:17:37 +01:00
Daniel Gustafsson cc09e6549f Remove the adminpack contrib extension
The adminpack extension was only used to support pgAdmin III,  which
in turn was declared EOL many years ago. Removing the extension also
allows us to remove functions from core as well which were only used
to support old version of adminpack.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACUmL5TraYBUBqDZBi1C+Re8_=SekqGYqYprj_W8wygQ8w@mail.gmail.com
2024-03-04 12:39:22 +01:00
Peter Eisentraut dbbca2cf29 Remove unused #include's from backend .c files
as determined by include-what-you-use (IWYU)

While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that.  In some cases, a more
specific #include replaces another less specific one.

Some manual adjustments of the automatic result:

- IWYU currently doesn't know about includes that provide global
  variable declarations (like -Wmissing-variable-declarations), so
  those includes are being kept manually.

- All includes for port(ability) headers are being kept for now, to
  play it safe.

- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
  patch from exploding in size.

Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.

As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.

Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-03-04 12:02:20 +01:00
Heikki Linnakangas 24eebc65c2 Remove unused 'countincludesself' argument to pq_sendcountedtext()
It has been unused since we removed support for protocol version 2.
2024-03-04 12:56:05 +02:00
Heikki Linnakangas 0dd094c4a0 Remove unused ParallelWorkerInfo.pid field
The pid was originally used in error context of messages propagated
from parallel workers, but commit 292794f82b removed that. If the need
arises in the future, you can also get the pid with
"shm_mq_get_sender(pcxt->worker[i].error_mqh)->pid".
2024-03-04 12:56:02 +02:00
Heikki Linnakangas 393b5599e5 Use MyBackendType in more places to check what process this is
Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(),
IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of
new Am*Process() macros that use MyBackendType. For consistency with
the existing Am*Process() macros.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
2024-03-04 10:25:12 +02:00
Heikki Linnakangas 067701f577 Remove MyAuxProcType, use MyBackendType instead
MyAuxProcType was redundant with MyBackendType.

Reviewed-by: Reid Thompson, Andres Freund
Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
2024-03-04 10:25:09 +02:00
David Rowley a0cd954480 Optimize GenerationAlloc() and SlabAlloc()
In a similar effort to 413c18401, separate out the hot and cold paths in
GenerationAlloc() and SlabAlloc() to avoid having to setup the stack frame
for the hot path.

This additionally adjusts how we use the GenerationContext's freeblock.
Freeblock, when set, is now always empty and we only switch to using it
when the current allocation request finds the current block does not have
enough space and the freeblock is large enough to accomodate the
allocation.

This commit also adjusts GenerationFree() so that if we pfree the final
allocation in the current generation block, we now mark that block as
empty and keep it as the current block.  Previously we free'd that block
and set the current block to NULL.  Doing that meant we needed a special
case in GenerationAlloc to check if GenerationContext.block was NULL.
So this both reduces free/malloc calls and reduces the work done in
GenerationAlloc().

In passing, improve some comments in aset.c

Discussion: https://postgr.es/m/CAApHDvpHVSJqqb4B4OZLixr=CotKq-eKkbwZqvZVo_biYvUvQA@mail.gmail.com
2024-03-04 17:42:10 +13:00
David Rowley 07c36c1333 Support partition pruning on boolcol IS [NOT] UNKNOWN
While working on 4c2369ac5, I noticed we went out of our way not to
support clauses on boolean partitioned tables in the form of "IS
UNKNOWN" and "IS NOT UNKNOWN".  It's almost as much code to disallow
this as it is to allow it, so let's allow it.

Discussion: https://postgr.es/m/CAApHDvobKtcN6+xOuOfcutfp6T7jP=JPA9y3=MAEqnuKdDsQrw@mail.gmail.com
2024-03-04 14:40:22 +13:00
Michael Paquier 6782709df8 Add regression test for restart points during promotion
This test serves as a way to demonstrate how to use the features
introduced in 37b369dc67, while providing coverage for 7863ee4def
that caused the startup process to throw "PANIC: could not locate a
valid checkpoint record" when starting recovery.  The test checks that a
node is able to properly restart following a crash when a restart point
was finishing across a promotion, with an injection point added in the
middle of CreateRestartPoint() to stop the restartpoint in flight.  Note
that this test fails when 7863ee4def is reverted.

Kyotaro Horiguchi is the original author of this test, that has been
originally posted on the thread where 7863ee4def was discussed.  I
have just upgraded and polished it to rely on injection points, making
it much cheaper to reproduce the failure.

This test requires injection points to be enabled in the builds, hence
meson and ./configure need an update to pass this knowledge down to the
test.  The name of the new injection point follows the same naming
convention as 6a1ea02c49.  The Makefile's EXTRA_INSTALL of recovery
TAP tests is updated to include modules/injection_points.

Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Andrey Borodin, Bertrand Drouvot
Discussion: https://postgr.es/m/ZdLuxBk5hGpol91B@paquier.xyz
2024-03-04 09:49:03 +09:00
Heikki Linnakangas 024c521117 Replace BackendIds with 0-based ProcNumbers
Now that BackendId was just another index into the proc array, it was
redundant with the 0-based proc numbers used in other places. Replace
all usage of backend IDs with proc numbers.

The only place where the term "backend id" remains is in a few pgstat
functions that expose backend IDs at the SQL level. Those IDs are now
in fact 0-based ProcNumbers too, but the documentation still calls
them "backend ids". That term still seems appropriate to describe what
the numbers are, so I let it be.

One user-visible effect is that pg_temp_0 is now a valid temp schema
name, for backend with ProcNumber 0.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
2024-03-03 19:38:22 +02:00
Heikki Linnakangas ab355e3a88 Redefine backend ID to be an index into the proc array
Previously, backend ID was an index into the ProcState array, in the
shared cache invalidation manager (sinvaladt.c). The entry in the
ProcState array was reserved at backend startup by scanning the array
for a free entry, and that was also when the backend got its backend
ID. Things become slightly simpler if we redefine backend ID to be the
index into the PGPROC array, and directly use it also as an index to
the ProcState array. This uses a little more memory, as we reserve a
few extra slots in the ProcState array for aux processes that don't
need them, but the simplicity is worth it.

Aux processes now also have a backend ID. This simplifies the
reservation of BackendStatusArray and ProcSignal slots.

You can now convert a backend ID into an index into the PGPROC array
simply by subtracting 1. We still use 0-based "pgprocnos" in various
places, for indexes into the PGPROC array, but the only difference now
is that backend IDs start at 1 while pgprocnos start at 0. (The next
commmit will get rid of the term "backend ID" altogether and make
everything 0-based.)

There is still a 'backendId' field in PGPROC, now part of 'vxid' which
encapsulates the backend ID and local transaction ID together. It's
needed for prepared xacts. For regular backends, the backendId is
always equal to pgprocno + 1, but for prepared xact PGPROC entries,
it's the ID of the original backend that processed the transaction.

Reviewed-by: Andres Freund, Reid Thompson
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
2024-03-03 19:37:28 +02:00
Alvaro Herrera 30b8d6e4ce
GUC table: Add description to computed variables
Per suggestion from Kyotaro Horiguchi
Discussion: https://postgr.es/m/20240229.130404.1411153273308142188.horikyota.ntt@gmail.com
2024-03-03 14:53:47 +01:00
Thomas Munro 653b55b570 Return ssize_t in fd.c I/O functions.
In the past, FileRead() and FileWrite() used types based on the Unix
read() and write() functions from before C and POSIX standardization,
though not exactly (we had int for amount instead of unsigned).  In
commit 2d4f1ba6 we changed to the appropriate standard C types, just
like the modern POSIX functions they wrap, but again not exactly: the
return type stayed as int.  In theory, a ssize_t value could be returned
by the underlying call that is too large for an int.

That wasn't really a live bug, because we don't expect PostgreSQL code
to perform reads or writes of gigabytes, and OSes probably apply
internal caps smaller than that anyway.  This change is done on the
principle that the return might as well follow the standard interfaces
consistently.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1672202.1703441340%40sss.pgh.pa.us
2024-03-02 12:09:28 +13:00
Michael Paquier 655dc31046 Simplify pg_enc2gettext_tbl[] with C99-designated initializer syntax
This commit switches pg_enc2gettext_tbl[] in encnames.c to use a
C99-designated initializer syntax.

pg_bind_textdomain_codeset() is simplified so as it is possible to do
a direct lookup at the gettext() array with a value of the enum pg_enc
rather than doing a loop through all its elements, as long as the
encoding value provided by GetDatabaseEncoding() is in the correct range
of supported encoding values.  Note that PG_MULE_INTERNAL gains a value
in the array, pointing to NULL.

Author: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-03-01 18:03:48 +09:00
Nathan Bossart 963d3072af Convert unloggedLSN to an atomic variable.
Currently, this variable is an XLogRecPtr protected by a spinlock.
By converting it to an atomic variable, we can remove the spinlock,
which saves a small amount of shared memory space.  Since this code
is not performance-critical, we use atomic operations with full
barrier semantics to make it easy to reason about correctness.

Author: John Morris
Reviewed-by: Michael Paquier, Robert Haas, Andres Freund, Stephen Frost, Bharath Rupireddy
Discussion: https://postgr.es/m/BYAPR13MB26772534335255E50318C574A0409%40BYAPR13MB2677.namprd13.prod.outlook.com
Discussion: https://postgr.es/m/MN2PR13MB2688FD8B757316CB5C54C8A2A0DDA%40MN2PR13MB2688.namprd13.prod.outlook.com
2024-02-29 14:34:10 -06:00
Nathan Bossart 3179701505 Convert archiver's force_dir_scan variable to an atomic variable.
Commit bd5132db55 introduced new atomic read/write functions with
full barrier semantics, which are intended to simplify converting
non-performance-critical code to use atomic variables.  This commit
demonstrates one such conversion.

Reviewed-by: Yong Li
Discussion: https://postgr.es/m/20231110205128.GB1315705%40nathanxps13
2024-02-29 10:17:55 -06:00
Dean Rasheed 5f2e179bd3 Support MERGE into updatable views.
This allows the target relation of MERGE to be an auto-updatable or
trigger-updatable view, and includes support for WITH CHECK OPTION,
security barrier views, and security invoker views.

A trigger-updatable view must have INSTEAD OF triggers for every type
of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command.
An auto-updatable view must not have any INSTEAD OF triggers. Mixing
auto-update and trigger-update actions (i.e., having a partial set of
INSTEAD OF triggers) is not supported.

Rule-updatable views are also not supported, since there is no
rewriter support for non-SELECT rules with MERGE operations.

Dean Rasheed, reviewed by Jian He and Alvaro Herrera.

Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
2024-02-29 15:56:59 +00:00
Peter Eisentraut 8b29a119fd Add missing RangeTblEntry field to jumble
RangeTblEntry.funcordinality should be jumbled, because the WITH
ORDINALITY clause changes the query result.

This was apparently an oversight in the past.

Discussion: https://www.postgresql.org/message-id/flat/d7f421f8-fd6d-4759-adc3-247090a5d44b%40eisentraut.org
2024-02-29 14:09:09 +01:00
Dean Rasheed 362de947cd Remove field UpdateContext->updated in nodeModifyTable.c
This field has been redundant ever since it was added by commit
25e777cf8e, which split up ExecUpdate() and ExecDelete() into reusable
pieces. The only place that reads it is ExecMergeMatched(), if the
result from ExecUpdateAct() is TM_Ok. However, all paths through
ExecUpdateAct() that return TM_Ok also set this field to true, so the
return status by itself is sufficient to tell if the update happened.

Removing this field is a modest simplification, and it brings the
UPDATE path in ExecMergeMatched() more into line with ExecUpdate(),
ensuring that ExecUpdateEpilogue() is always called if ExecUpdateAct()
returns TM_Ok, reducing the chance of bugs.

Dean Rasheed, reviewed by Alvaro Herrera.

Discussion: https://postgr.es/m/CAEZATCWGGmigGBzLHkJm5Ccv2mMxXmwi3%2Buq0yhwDHm-tsvSLg%40mail.gmail.com
2024-02-29 11:49:30 +00:00
Daniel Gustafsson 6fd144e3a9 Fix integer underflow in shared memory debugging
dsa_dump would print a large negative number instead of zero for
segment bin 0.  Fix by explicitly checking for underflow and add
special case for bin 0. Backpatch to all supported versions.

Author: Ian Ilyasov <ianilyasov@outlook.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/GV1P251MB1004E0D09D117D3CECF9256ECD502@GV1P251MB1004.EURP251.PROD.OUTLOOK.COM
Backpatch-through: v12
2024-02-29 12:19:52 +01:00
Amit Kapila b3f6b14cf4 Fixups for commit 93db6cbda0.
Ensure to set always-secure search path for both local and remote
connections during slot synchronization, so that malicious users can't
redirect user code (e.g. operators).

In the passing, improve the name of define, remove spurious return
statement, and a minor change in one of the comments.

Author: Bertrand Drouvot and Shveta Malik
Reviewed-by: Amit Kapila, Peter Smith
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
Discussion: https://postgr.es/m/ZdcejBDCr+wlVGnO@ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://postgr.es/m/CAJpy0uBNP=nrkNJkJSfF=jSocEh8vU2Owa8Rtpi=63fG=SvfVQ@mail.gmail.com
2024-02-29 09:45:20 +05:30
Tom Lane d163fdbfea Fix mis-rounding and overflow hazards in date_bin().
In the case where the target timestamp is before the origin timestamp
and their difference is already an exact multiple of the stride, the
code incorrectly subtracted the stride anyway.

Also detect several integer-overflow cases that previously produced
bogus results.  (The submitted patch tried to avoid overflow, but
I'm not convinced it's right, and problematic cases are so far out of
the plausibly-useful range that they don't seem worth sweating over.
Let's just use overflow-detecting arithmetic and throw errors.)

timestamp_bin() and timestamptz_bin() are basically identical and
so had identical bugs.  Fix both.

Report and patch by Moaaz Assali, adjusted some by me.  Back-patch
to v14 where date_bin() was introduced.

Discussion: https://postgr.es/m/CALkF+nvtuas-2kydG-WfofbRSJpyODAJWun==W-yO5j2R4meqA@mail.gmail.com
2024-02-28 14:00:30 -05:00
Alvaro Herrera 53c2a97a92
Improve performance of subsystems on top of SLRU
More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches.  In order for this to work with good
performance, two additional changes are made:

1. the cache is divided in "banks" (to borrow terminology from CPU
   caches), and algorithms such as eviction buffer search only affect
   one specific bank.  This forestalls the problem that linear searching
   for a specific buffer across the whole cache takes too long: we only
   have to search the specific bank, whose size is small.  This work is
   authored by Andrey Borodin.

2. Change the locking regime for the SLRU banks, so that each bank uses
   a separate LWLock.  This allows for increased scalability.  This work
   is authored by Dilip Kumar.  (A part of this was previously committed as
   d172b717c6f4.)

Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next.  This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this.  I (Álvaro) also added
lots of comments to make sure the design is sound.

The new GUCs match the names introduced by bcdfa5f2e2 in the
pg_stat_slru view.

The default values for these parameters are similar to the previous
sizes of each SLRU.  commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB.  (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.)  The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0.  These values based on
shared_buffers may need to be revisited, but that's an easy change.

There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally).  However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.

Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
	Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
	Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
2024-02-28 17:05:31 +01:00
Heikki Linnakangas 0b16bb8776 Remove AIX support
There isn't a lot of user demand for AIX support, we have a bunch of
hacks to work around AIX-specific compiler bugs and idiosyncrasies,
and no one has stepped up to the plate to properly maintain it.
Remove support for AIX to get rid of that maintenance overhead. It's
still supported for stable versions.

The acute issue that triggered this decision was that after commit
8af2565248, the AIX buildfarm members have been hitting this
assertion:

    TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 472, PID: 2949728

Apperently the "pg_attribute_aligned(a)" attribute doesn't work on AIX
for values larger than PG_IO_ALIGN_SIZE, for a static const variable.
That could be worked around, but we decided to just drop the AIX support
instead.

Discussion: https://www.postgresql.org/message-id/20240224172345.32@rfd.leadboat.com
Reviewed-by: Andres Freund, Noah Misch, Thomas Munro
2024-02-28 15:17:23 +04:00
Alvaro Herrera bcdfa5f2e2
Rename SLRU elements in view pg_stat_slru
The new names are intended to match those in an upcoming patch that adds
a few GUCs to configure the SLRU buffer sizes.

Backwards compatibility concern: this changes the accepted names for
function pg_stat_slru_rest().  Since this function recognizes "any other
string" as a request to reset the entry for "other", this means that
calling it with the old names would silently reset "other" instead of
doing nothing or throwing an error.

Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql
2024-02-28 09:39:52 +01:00
Michael Paquier 48920476b4 Remove last NULL element in config_group_names[]
This has not been needed since 9d77708d83 where there was a loop to
print all the possible GUC groups, relying on the last element to be
NULL.

Author: Japin Li
Reviewed-By: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-02-28 12:51:35 +09:00
David Rowley 413c18401d Refactor AllocSetAlloc(), separating hot and cold paths
Allocating from a free list or from a block which contains enough space
already, we deem to be common code paths and want to optimize for those.
Having to allocate a new block, either a normal block or a dedicated one
for a large allocation, we deem to be less common, therefore we class
that as "cold".  Both cold paths require a malloc so are going to be
slower as a result of that regardless.

The main motivation here is to remove the calls to malloc() in the hot
path and because of this, the compiler is now free to not bother setting
up the stack frame in AllocSetAlloc(), thus making the hot path much
cheaper.

Author: Andres Freund
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
2024-02-28 14:20:43 +13:00
Michael Paquier afd8ef3909 Use C99-designated initializer syntax for more arrays
This is in the same spirit as ef5e2e9085, updating this time some
arrays in parser.c, relpath.c, guc_tables.c and pg_dump_sort.c so as the
order of their elements has no need to match the enum structures they
are based on anymore.

Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Japin Li
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-02-28 08:42:36 +09:00
Nathan Bossart e1724af42c Fix comments for the dshash_parameters struct.
A recent commit added a copy_function member to the
dshash_parameters struct, but it missed updating a couple of
comments that refer to the function pointer members of this struct.
One of those comments also refers to a tranche_name member and non-
arg variants of the function pointer members, all of which were
either removed during development or removed shortly after dshash
table support was committed.

Oversights in commits 8c0d7bafad, d7694fc148, and 42a1de3013.

Discussion: https://postgr.es/m/20240227045213.GA2329190%40nathanxps13
2024-02-27 09:44:59 -06:00
Andrew Dunstan 92d2ab7554 Rationalize and improve error messages for some jsonpath items
This is a followup to commit 66ea94e8e6.

Error mssages concerning incorrect formats for date-time types are
unified and parameterized, instead of using a fully separate error
message for each type.

Similarly, error messages regarding numeric and string arguments to
certain items are standardized, and instead of saying that the argument
is out of range simply say that it is invalid. The actual invalid
arguments to these itesm are now shown in the error message.

Error messages relating to numeric inputs of Nan or Infinity are
made more informative.

Jeevan Chalke and Kyotaro Horiguchi, with some input from Tom Lane.

Discussion: https://postgr.es/m/20240129.121200.235012930453045390.horikyota.ntt@gmail.com
2024-02-27 02:07:22 -05:00
Michael Paquier ef5e2e9085 Remove unnecessary array object_classes[] in dependency.c
object_classes[] provided unnecessary indirection between catalog OIDs
and the enum ObjectClass when calling add_object_address().  This array
has been originally introduced in 30ec31604d and was useful because not
all relation OIDs were compile-time constants back then, which has not
been the case for a long time now for all the elements of ObjectClass.

This commit removes object_classes[], switching to the catalog OIDs
when calling add_object_address().  This shaves some code while saving
in maintenance because it was necessary to maintain the enum ObjectClass
and the array in sync when adding new object types.

Reported-by: Jeff Davis
Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-02-27 15:18:17 +09:00
David Rowley 743112a2e9 Adjust memory allocation functions to allow sibling calls
Many modern compilers are able to optimize function calls to functions
where the parameters of the called function match a leading subset of
the calling function's parameters.  If there are no instructions in the
calling function after the function is called, then the compiler is free
to avoid any stack frame setup and implement the function call as a
"jmp" rather than a "call".  This is called sibling call optimization.

Here we adjust the memory allocation functions in mcxt.c to allow this
optimization.  This requires moving some responsibility into the memory
context implementations themselves.  It's now the responsibility of the
MemoryContext to check for malloc failures.  This is good as it both
allows the sibling call optimization, but also because most small and
medium allocations won't call malloc and just allocate memory to an
existing block.  That can't fail, so checking for NULLs in that case
isn't required.

Also, traditionally it's been the responsibility of palloc and the other
allocation functions in mcxt.c to check for invalid allocation size
requests.  Here we also move the responsibility of checking that into the
MemoryContext.  This isn't to allow the sibling call optimization, but
more because most of our allocators handle large allocations separately
and we can just add the size check when doing large allocations.  We no
longer check this for non-large allocations at all.

To make checking the allocation request sizes and ERROR handling easier,
add some helper functions to mcxt.c for the allocators to use.

Author: Andres Freund
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
2024-02-27 16:39:42 +13:00
Michael Paquier 17a3f79f81 Fix comment thinko in sequence.c
One comment mentioned indexes, but the relation opened should be
sequences.

Reported-by: Matthias van de Meent
Discussion: https://postgr.es/m/CAEze2WiMGNG9XK3NSUen-5BARhCnP=u=FXnf8pvpL2qDKeOsZg@mail.gmail.com
2024-02-27 08:19:39 +09:00
Nathan Bossart 42a1de3013 Add helper functions for dshash tables with string keys.
Presently, string keys are not well-supported for dshash tables.
The dshash code always copies key_size bytes into new entries'
keys, and dshash.h only provides compare and hash functions that
forward to memcmp() and tag_hash(), both of which do not stop at
the first NUL.  This means that callers must pad string keys so
that the data beyond the first NUL does not adversely affect the
results of copying, comparing, and hashing the keys.

To better support string keys in dshash tables, this commit does
a couple things:

* A new copy_function field is added to the dshash_parameters
  struct.  This function pointer specifies how the key should be
  copied into new table entries.  For example, we only want to copy
  up to the first NUL byte for string keys.  A dshash_memcpy()
  helper function is provided and used for all existing in-tree
  dshash tables without string keys.

* A set of helper functions for string keys are provided.  These
  helper functions forward to strcmp(), strcpy(), and
  string_hash(), all of which ignore data beyond the first NUL.

This commit also adjusts the DSM registry's dshash table to use the
new helper functions for string keys.

Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
2024-02-26 15:47:13 -06:00
Nathan Bossart 5fe08c006c Use NULL instead of 0 for 'arg' argument in dshash_create() calls.
A couple of dshash_create() callers provide 0 for the 'void *arg'
argument, which might give readers the incorrect impression that
this is some sort of "flags" parameter.

Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
2024-02-26 15:46:01 -06:00
Alvaro Herrera 5f79cb7629
slru.c: Reduce scope of variables in 'for' blocks
Pretty boring.
2024-02-26 16:49:50 +01:00
Michael Paquier 6e951bf98e Group more closely cache updates for backends in sequence.c
Information of sequences is cached for each backend for currval() and
nextval(), and the update of some cached information was mixed in the
middle of computations based on the other properties of a sequence, for
the increment value in nextval() and the cached state when altering a
sequence.

Grouping them makes the code easier to follow and to refactor in the
future, when splitting the computation and the SeqTable change parts.
Note that the cached data is untouched between the areas where these
cache updates are moved.

Issue noticed while doing some refactoring of the sequence code.

Author: Michael Paquier
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
2024-02-26 17:03:18 +09:00
Michael Paquier 449e798c77 Introduce sequence_*() access functions
Similarly to tables and indexes, these functions are able to open
relations with a sequence relkind, which is useful to make a distinction
with the other relation kinds.  Previously, commands/sequence.c used a
mix of table_{close,open}() and relation_{close,open}() routines when
manipulating sequence relations, so this clarifies the code.

A direct effect of this change is to align the error messages produced
when attempting DDLs for sequences on relations with an unexpected
relkind, like a table or an index with ALTER SEQUENCE, providing an
extra error detail about the relkind of the relation used in the DDL
query.

Author: Michael Paquier
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
2024-02-26 16:04:59 +09:00
Peter Eisentraut 025f0a6f91 Fix incorrect format placeholder
Not only did the format placeholder not match the variable, the
variable also didn't match the function it was getting its value from.
2024-02-26 07:16:31 +01:00
Tom Lane f5a465f1a0 Promote assertion about !ReindexIsProcessingIndex to runtime error.
When this assertion was installed (in commit d2f60a3ab), I thought
it was only for catching server logic errors that caused accesses to
catalogs that were undergoing index rebuilds.  However, it will also
fire in case of a user-defined index expression that attempts to
access its own table.  We occasionally see reports of people trying
to do that, and typically getting unintelligible low-level errors
as a result.  We can provide a more on-point message by making this
a regular runtime check.

While at it, adjust the similar error check in
systable_beginscan_ordered to use the same message text.  That one
is (probably) not reachable without a coding bug, but we might as
well use a translatable message if we have one.

Per bug #18363 from Alexander Lakhin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18363-e3598a5a572d0699@postgresql.org
2024-02-25 16:15:07 -05:00
Alexander Korotkov 28e858c0f9 Improve documentation and GUC description for transaction_timeout
Reported-by: Alexander Lakhin
2024-02-25 20:30:17 +02:00
Alexander Korotkov 466979ef03 Replace lateral references to removed rels in subqueries
This commit introduces a new field 'sublevels_up' in ReplaceVarnoContext,
and enhances replace_varno_walker() to:
  1) recurse into subselects with sublevels_up increased, and
  2) perform the replacement only when varlevelsup is equal to sublevels_up.

This commit also fixes some outdated comments.  And besides adding relevant
test cases, it makes some unification over existing SJE test cases.

Discussion: https://postgr.es/m/CAMbWs4-%3DPO6Mm9gNnySbx0VHyXjgnnYYwbN9dth%3DTLQweZ-M%2Bg%40mail.gmail.com
Author: Richard Guo
Reviewed-by: Andrei Lepikhov, Alexander Korotkov
2024-02-24 00:35:17 +02:00
Tom Lane a6b2a51e16 Avoid dangling-pointer problem with partitionwise joins under GEQO.
build_child_join_sjinfo creates a derived SpecialJoinInfo in
the short-lived GEQO context, but afterwards the semi_rhs_exprs
from that may be used in a UniquePath for a child base relation.
This breaks the expectation that all base-relation-level structures
are in the planning-lifespan context, leading to use of a dangling
pointer with probable ensuing crash later on in create_unique_plan.
To fix, copy the expression trees when making a UniquePath.

Per bug #18360 from Alexander Lakhin.  This has been broken since
partitionwise joins were added, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/18360-a23caf3157f34e62@postgresql.org
2024-02-23 15:21:53 -05:00
Heikki Linnakangas d360e3cc60 Fix compiler warning on typedef redeclaration
bulk_write.c:78:3: error: redefinition of typedef 'BulkWriteState' is a C11 feature [-Werror,-Wtypedef-redefinition]
    } BulkWriteState;
      ^
    ../../../../src/include/storage/bulk_write.h:20:31: note: previous definition is here
    typedef struct BulkWriteState BulkWriteState;
                                  ^
    1 error generated.

Per buildfarm animals 'sifaka' and 'longfin'.

Discussion: https://www.postgresql.org/message-id/9e1f63c3-ef16-404c-b3cb-859a96eaba39@iki.fi
2024-02-23 17:39:27 +02:00
Heikki Linnakangas 8af2565248 Introduce a new smgr bulk loading facility.
The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.

The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.

It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.

The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
2024-02-23 16:10:51 +02:00